Working with Data
=================
Scalars
-------
.. code:: python
    n <- 3.14
    s <- 'c' 
    b <- TRUE 
.. code:: python
    typeof(n)
.. raw:: html
    'double'
.. code:: python
    typeof(s)
.. raw:: html
    'character'
.. code:: python
    typeof(b)
.. raw:: html
    'logical'
Vectors
-------
Vectors are 1D collections of the same scalar type.
.. code:: python
    xs <- c(1, 0.5, 0.25)
    ss <- c('G', 'A', 'T', 'T', 'A', 'C', 'A')
    bs <- c(T, T, F, F, T, T, F, F)
.. code:: python
    xs
.. raw:: html
    
    	- 1
 
    	- 0.5
 
    	- 0.25
 
    
.. code:: python
    ss
.. raw:: html
    
    	- 'G'
 
    	- 'A'
 
    	- 'T'
 
    	- 'T'
 
    	- 'A'
 
    	- 'C'
 
    	- 'A'
 
    
.. code:: python
    bs
.. raw:: html
    
    	- TRUE
 
    	- TRUE
 
    	- FALSE
 
    	- FALSE
 
    	- TRUE
 
    	- TRUE
 
    	- FALSE
 
    	- FALSE
 
    
Extracting a single element
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
    xs[1]
.. raw:: html
    1
Extracting elments with a position vector
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
    ss[2:5]
.. raw:: html
    
    	- 'A'
 
    	- 'T'
 
    	- 'T'
 
    	- 'A'
 
    
Extracting elemnents wiht a logical vector
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
    ss[bs]
.. raw:: html
    
    	- 'G'
 
    	- 'A'
 
    	- 'A'
 
    	- 'C'
 
    
Extracting elements with a logical condition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
    ss[ss %in% c('A', 'T')]
.. raw:: html
    
    	- 'A'
 
    	- 'T'
 
    	- 'T'
 
    	- 'A'
 
    	- 'A'
 
    
Matrices and Arrays
-------------------
Like vecorrs, only in 2D (matrices) or more (arrays).
.. code:: python
    m <- matrix(1:12, ncol=4)
.. code:: python
    m
.. raw:: html
    
.. code:: python
    m[6:10]
.. raw:: html
    
    	- 6
 
    	- 7
 
    	- 8
 
    	- 9
 
    	- 10
 
    
.. code:: python
    m[m < 10]
.. raw:: html
    
    	- 1
 
    	- 2
 
    	- 3
 
    	- 4
 
    	- 5
 
    	- 6
 
    	- 7
 
    	- 8
 
    	- 9
 
    
.. code:: python
    m[2,]
.. raw:: html
    
    	- 2
 
    	- 5
 
    	- 8
 
    	- 11
 
    
.. code:: python
    m[,2]
.. raw:: html
    
    	- 4
 
    	- 5
 
    	- 6
 
    
Work!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Try to solve the following problems without searching the web. You can
use the built-in ``help()`` function.
Create the following :math:`3 \times 3` matrix and save in a variable
called ``A``.
-  Row 1 = 4, 5, 6
-  Row 2 = 1, 2, 3
-  Row 3 = 7, 8, 9
What is the sum of all the numbers in A?
Create a vector of the column sums in ``A`` using the ``colSums``
function.
Create a vector of the row sums in ``A`` using the ``apply`` function.
What is the sum of the numbers in bottom right :math:`2 \times2` block
(i.e the numbers 2, 3, 8, 9)
Lists
-----
.. code:: python
    ls <- list(dna=ss, ispurine=ss %in% c('A', 'G'))
.. code:: python
    ls
.. raw:: html
    
    	- $dna
 
    		
    	- 'G'
 
    	- 'A'
 
    	- 'T'
 
    	- 'T'
 
    	- 'A'
 
    	- 'C'
 
    	- 'A'
 
    
     
    	- $ispurine
 
    		
    	- TRUE
 
    	- TRUE
 
    	- FALSE
 
    	- FALSE
 
    	- TRUE
 
    	- FALSE
 
    	- TRUE
 
    
     
    
Extracting a sublist from a list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
    ls[1]
.. raw:: html
    $dna = 
    	- 'G'
 
    	- 'A'
 
    	- 'T'
 
    	- 'T'
 
    	- 'A'
 
    	- 'C'
 
    	- 'A'
 
    
.. code:: python
    class(ls[1])
.. raw:: html
    'list'
Extracting an element from a list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
    ls$dna
.. raw:: html
    
    	- 'G'
 
    	- 'A'
 
    	- 'T'
 
    	- 'T'
 
    	- 'A'
 
    	- 'C'
 
    	- 'A'
 
    
.. code:: python
    class(ls$dna)
.. raw:: html
    'character'
.. code:: python
    ls[[1]]
.. raw:: html
    
    	- 'G'
 
    	- 'A'
 
    	- 'T'
 
    	- 'T'
 
    	- 'A'
 
    	- 'C'
 
    	- 'A'
 
    
.. code:: python
    class(ls[[1]])
.. raw:: html
    'character'
Data frames
-----------
A data frame is a special list of vectors where all the vectors have the
same length. Because all the vectors have the same length, it can also
be thought of as a 2D table or matrix and manipulated in the same way.
.. code:: python
    df <- as.data.frame(ls)
.. code:: python
    class(ls)
.. raw:: html
    'list'
.. code:: python
    class(df)
.. raw:: html
    'data.frame'
.. code:: python
    df
.. raw:: html
    
     | dna | ispurine | 
    
    	| 1 | G | TRUE | 
    	| 2 | A | TRUE | 
    	| 3 | T | FALSE | 
    	| 4 | T | FALSE | 
    	| 5 | A | TRUE | 
    	| 6 | C | FALSE | 
    	| 7 | A | TRUE | 
    
    
.. code:: python
    df[4:6, ]
.. raw:: html
    
     | dna | ispurine | 
    
    	| 4 | T | FALSE | 
    	| 5 | A | TRUE | 
    	| 6 | C | FALSE | 
    
    
.. code:: python
    df$ispurine
.. raw:: html
    
    	- TRUE
 
    	- TRUE
 
    	- FALSE
 
    	- FALSE
 
    	- TRUE
 
    	- FALSE
 
    	- TRUE
 
    
.. code:: python
    df[df$ispurine, ]
.. raw:: html
    
     | dna | ispurine | 
    
    	| 1 | G | TRUE | 
    	| 2 | A | TRUE | 
    	| 5 | A | TRUE | 
    	| 7 | A | TRUE | 
    
    
Creating a data frame from scrach
---------------------------------
.. code:: python
    gender <- c('M', 'M', 'F', 'F', 'M', 'F', 'M')
    height <- c(1.65, 1.82, 1.56, 1.66, 1.72, 1.6, 1.8)
    weight <- c(65, 102, 55, 46, 78, 60, 72)
    
    bods <- data.frame(gender, height, weight)
.. code:: python
    bods
.. raw:: html
    
     | gender | height | weight | 
    
    	| 1 | M | 1.65 | 65 | 
    	| 2 | M | 1.82 | 102 | 
    	| 3 | F | 1.56 | 55 | 
    	| 4 | F | 1.66 | 46 | 
    	| 5 | M | 1.72 | 78 | 
    	| 6 | F | 1.6 | 60 | 
    	| 7 | M | 1.8 | 72 | 
    
    
We can add a new calculated column easily. Let's include the body mass
index (bmi).
.. code:: python
    bods$bmi <- bods$weight/bods$height^2
.. code:: python
    bods
.. raw:: html
    
     | gender | height | weight | bmi | 
    
    	| 1 | M | 1.65 | 65 | 23.87511 | 
    	| 2 | M | 1.82 | 102 | 30.79338 | 
    	| 3 | F | 1.56 | 55 | 22.60026 | 
    	| 4 | F | 1.66 | 46 | 16.69328 | 
    	| 5 | M | 1.72 | 78 | 26.3656 | 
    	| 6 | F | 1.6 | 60 | 23.4375 | 
    	| 7 | M | 1.8 | 72 | 22.22222 | 
    
    
Let's get rid of the bmi column.
.. code:: python
    bods$bmi <- NULL
.. code:: python
    bods
.. raw:: html
    
     | gender | height | weight | 
    
    	| 1 | M | 1.65 | 65 | 
    	| 2 | M | 1.82 | 102 | 
    	| 3 | F | 1.56 | 55 | 
    	| 4 | F | 1.66 | 46 | 
    	| 5 | M | 1.72 | 78 | 
    	| 6 | F | 1.6 | 60 | 
    	| 7 | M | 1.8 | 72 | 
    
    
Work!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
How many males are there?
What is the mean height?
What is the mean weight for femalse?
A person is classified as obese if their BMI exceeds 30. Add the BMI
column back into the data frame, as well as a new logical column
``is.obese`` indicating if a person is obese or not.
Reading data from files or URLs to dataframes
---------------------------------------------
See `Examples from the Quick-R
website `__