Working with Data
=================
Scalars
-------
.. code:: python
n <- 3.14
s <- 'c'
b <- TRUE
.. code:: python
typeof(n)
.. raw:: html
'double'
.. code:: python
typeof(s)
.. raw:: html
'character'
.. code:: python
typeof(b)
.. raw:: html
'logical'
Vectors
-------
Vectors are 1D collections of the same scalar type.
.. code:: python
xs <- c(1, 0.5, 0.25)
ss <- c('G', 'A', 'T', 'T', 'A', 'C', 'A')
bs <- c(T, T, F, F, T, T, F, F)
.. code:: python
xs
.. raw:: html
- 1
- 0.5
- 0.25
.. code:: python
ss
.. raw:: html
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
.. code:: python
bs
.. raw:: html
- TRUE
- TRUE
- FALSE
- FALSE
- TRUE
- TRUE
- FALSE
- FALSE
Extracting a single element
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
xs[1]
.. raw:: html
1
Extracting elments with a position vector
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
ss[2:5]
.. raw:: html
- 'A'
- 'T'
- 'T'
- 'A'
Extracting elemnents wiht a logical vector
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
ss[bs]
.. raw:: html
- 'G'
- 'A'
- 'A'
- 'C'
Extracting elements with a logical condition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
ss[ss %in% c('A', 'T')]
.. raw:: html
- 'A'
- 'T'
- 'T'
- 'A'
- 'A'
Matrices and Arrays
-------------------
Like vecorrs, only in 2D (matrices) or more (arrays).
.. code:: python
m <- matrix(1:12, ncol=4)
.. code:: python
m
.. raw:: html
.. code:: python
m[6:10]
.. raw:: html
- 6
- 7
- 8
- 9
- 10
.. code:: python
m[m < 10]
.. raw:: html
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
.. code:: python
m[2,]
.. raw:: html
- 2
- 5
- 8
- 11
.. code:: python
m[,2]
.. raw:: html
- 4
- 5
- 6
Work!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Try to solve the following problems without searching the web. You can
use the built-in ``help()`` function.
Create the following :math:`3 \times 3` matrix and save in a variable
called ``A``.
- Row 1 = 4, 5, 6
- Row 2 = 1, 2, 3
- Row 3 = 7, 8, 9
What is the sum of all the numbers in A?
Create a vector of the column sums in ``A`` using the ``colSums``
function.
Create a vector of the row sums in ``A`` using the ``apply`` function.
What is the sum of the numbers in bottom right :math:`2 \times2` block
(i.e the numbers 2, 3, 8, 9)
Lists
-----
.. code:: python
ls <- list(dna=ss, ispurine=ss %in% c('A', 'G'))
.. code:: python
ls
.. raw:: html
- $dna
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
- $ispurine
- TRUE
- TRUE
- FALSE
- FALSE
- TRUE
- FALSE
- TRUE
Extracting a sublist from a list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
ls[1]
.. raw:: html
$dna =
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
.. code:: python
class(ls[1])
.. raw:: html
'list'
Extracting an element from a list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
ls$dna
.. raw:: html
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
.. code:: python
class(ls$dna)
.. raw:: html
'character'
.. code:: python
ls[[1]]
.. raw:: html
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
.. code:: python
class(ls[[1]])
.. raw:: html
'character'
Data frames
-----------
A data frame is a special list of vectors where all the vectors have the
same length. Because all the vectors have the same length, it can also
be thought of as a 2D table or matrix and manipulated in the same way.
.. code:: python
df <- as.data.frame(ls)
.. code:: python
class(ls)
.. raw:: html
'list'
.. code:: python
class(df)
.. raw:: html
'data.frame'
.. code:: python
df
.. raw:: html
| dna | ispurine |
1 | G | TRUE |
2 | A | TRUE |
3 | T | FALSE |
4 | T | FALSE |
5 | A | TRUE |
6 | C | FALSE |
7 | A | TRUE |
.. code:: python
df[4:6, ]
.. raw:: html
| dna | ispurine |
4 | T | FALSE |
5 | A | TRUE |
6 | C | FALSE |
.. code:: python
df$ispurine
.. raw:: html
- TRUE
- TRUE
- FALSE
- FALSE
- TRUE
- FALSE
- TRUE
.. code:: python
df[df$ispurine, ]
.. raw:: html
| dna | ispurine |
1 | G | TRUE |
2 | A | TRUE |
5 | A | TRUE |
7 | A | TRUE |
Creating a data frame from scrach
---------------------------------
.. code:: python
gender <- c('M', 'M', 'F', 'F', 'M', 'F', 'M')
height <- c(1.65, 1.82, 1.56, 1.66, 1.72, 1.6, 1.8)
weight <- c(65, 102, 55, 46, 78, 60, 72)
bods <- data.frame(gender, height, weight)
.. code:: python
bods
.. raw:: html
| gender | height | weight |
1 | M | 1.65 | 65 |
2 | M | 1.82 | 102 |
3 | F | 1.56 | 55 |
4 | F | 1.66 | 46 |
5 | M | 1.72 | 78 |
6 | F | 1.6 | 60 |
7 | M | 1.8 | 72 |
We can add a new calculated column easily. Let's include the body mass
index (bmi).
.. code:: python
bods$bmi <- bods$weight/bods$height^2
.. code:: python
bods
.. raw:: html
| gender | height | weight | bmi |
1 | M | 1.65 | 65 | 23.87511 |
2 | M | 1.82 | 102 | 30.79338 |
3 | F | 1.56 | 55 | 22.60026 |
4 | F | 1.66 | 46 | 16.69328 |
5 | M | 1.72 | 78 | 26.3656 |
6 | F | 1.6 | 60 | 23.4375 |
7 | M | 1.8 | 72 | 22.22222 |
Let's get rid of the bmi column.
.. code:: python
bods$bmi <- NULL
.. code:: python
bods
.. raw:: html
| gender | height | weight |
1 | M | 1.65 | 65 |
2 | M | 1.82 | 102 |
3 | F | 1.56 | 55 |
4 | F | 1.66 | 46 |
5 | M | 1.72 | 78 |
6 | F | 1.6 | 60 |
7 | M | 1.8 | 72 |
Work!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
How many males are there?
What is the mean height?
What is the mean weight for femalse?
A person is classified as obese if their BMI exceeds 30. Add the BMI
column back into the data frame, as well as a new logical column
``is.obese`` indicating if a person is obese or not.
Reading data from files or URLs to dataframes
---------------------------------------------
See `Examples from the Quick-R
website `__