Working with Data

Scalars

n <- 3.14
s <- 'c'
b <- TRUE
typeof(n)
'double'
typeof(s)
'character'
typeof(b)
'logical'

Vectors

Vectors are 1D collections of the same scalar type.

xs <- c(1, 0.5, 0.25)
ss <- c('G', 'A', 'T', 'T', 'A', 'C', 'A')
bs <- c(T, T, F, F, T, T, F, F)
xs
  1. 1
  2. 0.5
  3. 0.25
ss
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
bs
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. TRUE
  7. FALSE
  8. FALSE

Extracting a single element

xs[1]
1

Extracting elments with a position vector

ss[2:5]
  1. 'A'
  2. 'T'
  3. 'T'
  4. 'A'

Extracting elemnents wiht a logical vector

ss[bs]
  1. 'G'
  2. 'A'
  3. 'A'
  4. 'C'

Extracting elements with a logical condition

ss[ss %in% c('A', 'T')]
  1. 'A'
  2. 'T'
  3. 'T'
  4. 'A'
  5. 'A'

Matrices and Arrays

Like vecorrs, only in 2D (matrices) or more (arrays).

m <- matrix(1:12, ncol=4)
m
1 4 710
2 5 811
3 6 912
m[6:10]
  1. 6
  2. 7
  3. 8
  4. 9
  5. 10
m[m < 10]
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
m[2,]
  1. 2
  2. 5
  3. 8
  4. 11
m[,2]
  1. 4
  2. 5
  3. 6

Try to solve the following problems without searching the web. You can use the built-in help() function.

Create the following \(3 \times 3\) matrix and save in a variable called A.

  • Row 1 = 4, 5, 6
  • Row 2 = 1, 2, 3
  • Row 3 = 7, 8, 9

What is the sum of all the numbers in A?

Create a vector of the column sums in A using the colSums function.

Create a vector of the row sums in A using the apply function.

What is the sum of the numbers in bottom right \(2 \times2\) block (i.e the numbers 2, 3, 8, 9)

Lists

ls <- list(dna=ss, ispurine=ss %in% c('A', 'G'))
ls
$dna
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
$ispurine
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. FALSE
  7. TRUE

Extracting a sublist from a list

ls[1]
$dna =
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
class(ls[1])
'list'

Extracting an element from a list

ls$dna
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
class(ls$dna)
'character'
ls[[1]]
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
class(ls[[1]])
'character'

Data frames

A data frame is a special list of vectors where all the vectors have the same length. Because all the vectors have the same length, it can also be thought of as a 2D table or matrix and manipulated in the same way.

df <- as.data.frame(ls)
class(ls)
'list'
class(df)
'data.frame'
df
dnaispurine
1GTRUE
2ATRUE
3TFALSE
4TFALSE
5ATRUE
6CFALSE
7ATRUE
df[4:6, ]
dnaispurine
4TFALSE
5ATRUE
6CFALSE
df$ispurine
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. FALSE
  7. TRUE
df[df$ispurine, ]
dnaispurine
1GTRUE
2ATRUE
5ATRUE
7ATRUE

Creating a data frame from scrach

gender <- c('M', 'M', 'F', 'F', 'M', 'F', 'M')
height <- c(1.65, 1.82, 1.56, 1.66, 1.72, 1.6, 1.8)
weight <- c(65, 102, 55, 46, 78, 60, 72)

bods <- data.frame(gender, height, weight)
bods
genderheightweight
1M1.6565
2M1.82102
3F1.5655
4F1.6646
5M1.7278
6F1.660
7M1.872

We can add a new calculated column easily. Let’s include the body mass index (bmi).

bods$bmi <- bods$weight/bods$height^2
bods
genderheightweightbmi
1M1.656523.87511
2M1.8210230.79338
3F1.565522.60026
4F1.664616.69328
5M1.727826.3656
6F1.66023.4375
7M1.87222.22222

Let’s get rid of the bmi column.

bods$bmi <- NULL
bods
genderheightweight
1M1.6565
2M1.82102
3F1.5655
4F1.6646
5M1.7278
6F1.660
7M1.872

How many males are there?

What is the mean height?

What is the mean weight for femalse?

A person is classified as obese if their BMI exceeds 30. Add the BMI column back into the data frame, as well as a new logical column is.obese indicating if a person is obese or not.

Reading data from files or URLs to dataframes

See Examples from the Quick-R website