Working with Data ================= Scalars ------- .. code:: python n <- 3.14 s <- 'c' b <- TRUE .. code:: python typeof(n) .. raw:: html 'double' .. code:: python typeof(s) .. raw:: html 'character' .. code:: python typeof(b) .. raw:: html 'logical' Vectors ------- Vectors are 1D collections of the same scalar type. .. code:: python xs <- c(1, 0.5, 0.25) ss <- c('G', 'A', 'T', 'T', 'A', 'C', 'A') bs <- c(T, T, F, F, T, T, F, F) .. code:: python xs .. raw:: html
  1. 1
  2. 0.5
  3. 0.25
.. code:: python ss .. raw:: html
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
.. code:: python bs .. raw:: html
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. TRUE
  7. FALSE
  8. FALSE
Extracting a single element ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python xs[1] .. raw:: html 1 Extracting elments with a position vector ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python ss[2:5] .. raw:: html
  1. 'A'
  2. 'T'
  3. 'T'
  4. 'A'
Extracting elemnents wiht a logical vector ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python ss[bs] .. raw:: html
  1. 'G'
  2. 'A'
  3. 'A'
  4. 'C'
Extracting elements with a logical condition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python ss[ss %in% c('A', 'T')] .. raw:: html
  1. 'A'
  2. 'T'
  3. 'T'
  4. 'A'
  5. 'A'
Matrices and Arrays ------------------- Like vecorrs, only in 2D (matrices) or more (arrays). .. code:: python m <- matrix(1:12, ncol=4) .. code:: python m .. raw:: html
1 4 710
2 5 811
3 6 912
.. code:: python m[6:10] .. raw:: html
  1. 6
  2. 7
  3. 8
  4. 9
  5. 10
.. code:: python m[m < 10] .. raw:: html
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
.. code:: python m[2,] .. raw:: html
  1. 2
  2. 5
  3. 8
  4. 11
.. code:: python m[,2] .. raw:: html
  1. 4
  2. 5
  3. 6
Work! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Try to solve the following problems without searching the web. You can use the built-in ``help()`` function. Create the following :math:`3 \times 3` matrix and save in a variable called ``A``. - Row 1 = 4, 5, 6 - Row 2 = 1, 2, 3 - Row 3 = 7, 8, 9 What is the sum of all the numbers in A? Create a vector of the column sums in ``A`` using the ``colSums`` function. Create a vector of the row sums in ``A`` using the ``apply`` function. What is the sum of the numbers in bottom right :math:`2 \times2` block (i.e the numbers 2, 3, 8, 9) Lists ----- .. code:: python ls <- list(dna=ss, ispurine=ss %in% c('A', 'G')) .. code:: python ls .. raw:: html
$dna
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
$ispurine
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. FALSE
  7. TRUE
Extracting a sublist from a list ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python ls[1] .. raw:: html $dna =
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
.. code:: python class(ls[1]) .. raw:: html 'list' Extracting an element from a list ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python ls$dna .. raw:: html
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
.. code:: python class(ls$dna) .. raw:: html 'character' .. code:: python ls[[1]] .. raw:: html
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
.. code:: python class(ls[[1]]) .. raw:: html 'character' Data frames ----------- A data frame is a special list of vectors where all the vectors have the same length. Because all the vectors have the same length, it can also be thought of as a 2D table or matrix and manipulated in the same way. .. code:: python df <- as.data.frame(ls) .. code:: python class(ls) .. raw:: html 'list' .. code:: python class(df) .. raw:: html 'data.frame' .. code:: python df .. raw:: html
dnaispurine
1GTRUE
2ATRUE
3TFALSE
4TFALSE
5ATRUE
6CFALSE
7ATRUE
.. code:: python df[4:6, ] .. raw:: html
dnaispurine
4TFALSE
5ATRUE
6CFALSE
.. code:: python df$ispurine .. raw:: html
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. FALSE
  7. TRUE
.. code:: python df[df$ispurine, ] .. raw:: html
dnaispurine
1GTRUE
2ATRUE
5ATRUE
7ATRUE
Creating a data frame from scrach --------------------------------- .. code:: python gender <- c('M', 'M', 'F', 'F', 'M', 'F', 'M') height <- c(1.65, 1.82, 1.56, 1.66, 1.72, 1.6, 1.8) weight <- c(65, 102, 55, 46, 78, 60, 72) bods <- data.frame(gender, height, weight) .. code:: python bods .. raw:: html
genderheightweight
1M1.6565
2M1.82102
3F1.5655
4F1.6646
5M1.7278
6F1.660
7M1.872
We can add a new calculated column easily. Let's include the body mass index (bmi). .. code:: python bods$bmi <- bods$weight/bods$height^2 .. code:: python bods .. raw:: html
genderheightweightbmi
1M1.656523.87511
2M1.8210230.79338
3F1.565522.60026
4F1.664616.69328
5M1.727826.3656
6F1.66023.4375
7M1.87222.22222
Let's get rid of the bmi column. .. code:: python bods$bmi <- NULL .. code:: python bods .. raw:: html
genderheightweight
1M1.6565
2M1.82102
3F1.5655
4F1.6646
5M1.7278
6F1.660
7M1.872
Work! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ How many males are there? What is the mean height? What is the mean weight for femalse? A person is classified as obese if their BMI exceeds 30. Add the BMI column back into the data frame, as well as a new logical column ``is.obese`` indicating if a person is obese or not. Reading data from files or URLs to dataframes --------------------------------------------- See `Examples from the Quick-R website `__