Practice 4 ---------------------------------------- .. code:: python library(ggplot2) library(reshape2) library(pheatmap) library(genefilter) # for rowttests library(nlme) data(Rail) .. parsed-literal:: Attaching package: ‘genefilter’ The following object is masked from ‘package:base’: anyNA **Exericse 0** Using base graphics, make a 2 x 2 grid for plotting. Top left - Generate 100 random numbers from a t-distribution with 5 degrees of freedom - Plot a normalized histogram (i.e. area sums to 1) with 8 breaks - Overlay a smoothed density estimate (using ``density``) in orange - Overlay the true density estimate (using ``dt``) in red - Add a rug Tor right - Using the ``iris`` data set, make a scatter plot of Sepal length agaisnt Sepal width. Color the points by Species. - Add your own title and x and y lables Bottom left - Generate 100 numbers from a Poisson distribution with rate = 3 - Plot a bar chart showing the counts for each value Bottom right - Make a box and whiskers plot of the 4 numeric variables in the ``iris`` data set - Show vertical labels for all Species names - Adjust the margins so that the labels are visible using the ``mar`` (bottom, left, top, right) parameter. You can see the default values with ``par()$mar``. Remember to restore the orignal parameters at the end. Save the plot as a Portable Neetwork Graphics (png) file. .. code:: python png("myplot.png") par(mfrow=c(2,2)) x <- rt(100, df=5) hist(x, breaks = 8, probability = TRUE) lines(density(x), col="orange") xp <- seq(min(x), max(x), length.out = 50) lines(xp, dt(xp, df=5), col="red") rug(x) plot(iris$Sepal.Width, iris$Sepal.Length, col=iris$Species, main="Scatter by Iris species", ylab="Length", xlab="Width") n <- rpois(100, lambda = 3) barplot(table(n)) orig <- par(no.readonly=TRUE) par(mar=c(6.1,4.1,4.1,2.1)) boxplot(iris[,1:4], las=2) par(orig) dev.off() .. raw:: html pdf: 2 .. code:: python head(Rail) .. raw:: html
Railtravel
1155
2153
3154
4226
5237
6232
Exercise 1 ^^^^^^^^^^ - Use the ggplot function - Create a scatter plot with Rail on the x-axis and travel on the y-axis. - CHange the title to "I made this!" - Change the y-axis label to be "Zero-force travel time (nano-seconds)" - Change the size of the points to 5 - Change the color of potins to blue and transparency to 0.5 - Add a simple linear regression line to the plot with 90% confidence intervals .. code:: python ggplot(Rail, aes(x=as.numeric(Rail), y=travel)) + geom_point(size=5, color="blue", alpha=0.5) + geom_smooth(method="lm") + labs(title="I made this", ylab="Zero-force travel time (nano-seconds)") .. image:: PracticeFOURSolutions_files/PracticeFOURSolutions_6_0.png :width: 600 Exercise 2 ^^^^^^^^^^ Here we will try to replicate the noise discovery heatmaps shown in the statistics class. .. code:: python set.seed(123) n <- 20 # number of subjects m <- 20000 # number of genes alpha <- 0.005 # significance level # create a matrix of gene expression values with m rows and 2*n columns M <- matrix(rnorm(2*n*m), m, 2*n) # give row and column names rownames(M) <- paste("G", 1:m, sep="") colnames(M) <- paste("id", 1:(2*n), sep="") # assign subjects inot equal sized groups grp <- factor(rep(0:1, c(n, n))) # calculate p-value using t-test for mean experession value of each gene pvals <- rowttests(M, grp)$p.value # extract the genes which meet the specified significance level hits <- M[pvals < alpha,] - Use pheatmap to plot a heatmap - Remove the row names (Use tAB or R's built-in help to figure out to do this) - Use this color palette to map expression values to a red-blakc-green scale ``colorRampPalette(c("red3", "black", "green3"))(50)`` .. code:: python pheatmap(hits, show_rownames = FALSE, color = colorRampPalette(c("red3", "black", "green3"))(50)) .. image:: PracticeFOURSolutions_files/PracticeFOURSolutions_10_0.png :width: 600 .. code:: python a1 <- 1 a2 <- 2 sigma1 <- 25 sigma2 <- 25 subject <- paste("PID", rep(1:2, each=5)) dose <- rep(seq(10, 100, 20), 2) geneA <- a1 * dose + rnorm(length(dose), sd=sigma1) geneB <- a2 * dose + rnorm(length(dose), sd=sigma2) df <- data.frame(subject=subject, dose=dose, geneA=geneA, geneB=geneB) .. code:: python df .. raw:: html
subjectdosegeneAgeneB
1PID 110-3.367338-31.18716
2PID 13054.0361391.78636
3PID 15022.40099124.477
4PID 17073.61274111.6039
5PID 19084.05749188.5058
6PID 210-13.6317912.04911
7PID 23040.0347257.40407
8PID 25064.9874779.04738
9PID 27091.22307147.4357
10PID 29070.94818166.8696
**Exercise 3** Using ``ggplot2``, plot gene expression agaisnt does, using differnt colors for differetn genes and differnet shapes for different subjects. (Hint: The ``reshape2`` library may come in useful) .. code:: python md <- melt(df, id=c("subject", "dose")) .. code:: python md .. raw:: html
subjectdosevariablevalue
1PID 110geneA-3.367338
2PID 130geneA54.03613
3PID 150geneA22.40099
4PID 170geneA73.61274
5PID 190geneA84.05749
6PID 210geneA-13.63179
7PID 230geneA40.03472
8PID 250geneA64.98747
9PID 270geneA91.22307
10PID 290geneA70.94818
11PID 110geneB-31.18716
12PID 130geneB91.78636
13PID 150geneB124.477
14PID 170geneB111.6039
15PID 190geneB188.5058
16PID 210geneB12.04911
17PID 230geneB57.40407
18PID 250geneB79.04738
19PID 270geneB147.4357
20PID 290geneB166.8696
.. code:: python ggplot(md, aes(x=dose, y=value, color=variable, shape=subject)) + geom_point(size=5) .. image:: PracticeFOURSolutions_files/PracticeFOURSolutions_16_0.png :width: 600 **Exercise 4** Usign ``ggplot2``, make a grid of plots with separae colummsn for each subject and separate rows for each gene. Vary the color and size by the gene expression value. Add a linear regression fit with 95% confidence intervals to each plot. .. code:: python ggplot(md, aes(x=dose, y=value, color=value)) + geom_point(size=3) + geom_smooth(method="lm") + facet_grid(variable ~ subject) .. image:: PracticeFOURSolutions_files/PracticeFOURSolutions_18_0.png :width: 600