Practice 4
----------------------------------------

.. code:: python

    library(ggplot2)
    library(reshape2)
    library(pheatmap)
    library(genefilter) # for rowttests
    library(nlme)
    data(Rail)


.. parsed-literal::

    
    Attaching package: ‘genefilter’
    
    The following object is masked from ‘package:base’:
    
        anyNA
    


**Exericse 0**

Using base graphics, make a 2 x 2 grid for plotting.

Top left - Generate 100 random numbers from a t-distribution with 5
degrees of freedom - Plot a normalized histogram (i.e. area sums to 1)
with 8 breaks - Overlay a smoothed density estimate (using ``density``)
in orange - Overlay the true density estimate (using ``dt``) in red -
Add a rug

Tor right - Using the ``iris`` data set, make a scatter plot of Sepal
length agaisnt Sepal width. Color the points by Species. - Add your own
title and x and y lables

Bottom left - Generate 100 numbers from a Poisson distribution with rate
= 3 - Plot a bar chart showing the counts for each value

Bottom right - Make a box and whiskers plot of the 4 numeric variables
in the ``iris`` data set - Show vertical labels for all Species names -
Adjust the margins so that the labels are visible using the ``mar``
(bottom, left, top, right) parameter. You can see the default values
with ``par()$mar``. Remember to restore the orignal parameters at the
end.

Save the plot as a Portable Neetwork Graphics (png) file.

.. code:: python

    png("myplot.png")
    par(mfrow=c(2,2))
    x <- rt(100, df=5)
    hist(x, breaks = 8, probability = TRUE)
    lines(density(x), col="orange")
    xp <- seq(min(x), max(x), length.out = 50)
    lines(xp, dt(xp, df=5), col="red")
    rug(x)
    plot(iris$Sepal.Width, iris$Sepal.Length, col=iris$Species, 
         main="Scatter by Iris species", ylab="Length", xlab="Width")
    n <- rpois(100, lambda = 3)
    barplot(table(n))
    orig <- par(no.readonly=TRUE)
    par(mar=c(6.1,4.1,4.1,2.1))
    boxplot(iris[,1:4], las=2)
    par(orig)
    dev.off()

.. raw:: html

    <strong>pdf:</strong> 2



.. code:: python

    head(Rail)




.. raw:: html

    <table>
    <thead><tr><th></th><th scope=col>Rail</th><th scope=col>travel</th></tr></thead>
    <tbody>
    	<tr><th scope=row>1</th><td>1</td><td>55</td></tr>
    	<tr><th scope=row>2</th><td>1</td><td>53</td></tr>
    	<tr><th scope=row>3</th><td>1</td><td>54</td></tr>
    	<tr><th scope=row>4</th><td>2</td><td>26</td></tr>
    	<tr><th scope=row>5</th><td>2</td><td>37</td></tr>
    	<tr><th scope=row>6</th><td>2</td><td>32</td></tr>
    </tbody>
    </table>




Exercise 1
^^^^^^^^^^

-  Use the ggplot function
-  Create a scatter plot with Rail on the x-axis and travel on the
   y-axis.
-  CHange the title to "I made this!"
-  Change the y-axis label to be "Zero-force travel time (nano-seconds)"
-  Change the size of the points to 5
-  Change the color of potins to blue and transparency to 0.5
-  Add a simple linear regression line to the plot with 90% confidence
   intervals

.. code:: python

    ggplot(Rail, aes(x=as.numeric(Rail), y=travel)) +
    geom_point(size=5, color="blue", alpha=0.5) +
    geom_smooth(method="lm") +
    labs(title="I made this", ylab="Zero-force travel time (nano-seconds)")


.. image:: PracticeFOURSolutions_files/PracticeFOURSolutions_6_0.png
   :width: 600




Exercise 2
^^^^^^^^^^

Here we will try to replicate the noise discovery heatmaps shown in the
statistics class.

.. code:: python

    set.seed(123)
    
    n <- 20 # number of subjects
    m <- 20000 # number of genes 
    alpha <- 0.005 # significance level
    
    # create a matrix of gene expression values with m rows and 2*n columns
    M <- matrix(rnorm(2*n*m), m, 2*n)
    
    # give row and column names
    rownames(M) <- paste("G", 1:m, sep="")
    colnames(M) <- paste("id", 1:(2*n), sep="")
    
    # assign subjects inot equal sized groups
    grp <- factor(rep(0:1, c(n, n)))
    
    # calculate p-value using t-test for mean experession value of each gene
    pvals <- rowttests(M, grp)$p.value
    
    # extract the genes which meet the specified significance level
    hits <- M[pvals < alpha,]

-  Use pheatmap to plot a heatmap
-  Remove the row names (Use tAB or R's built-in help to figure out to
   do this)
-  Use this color palette to map expression values to a red-blakc-green
   scale ``colorRampPalette(c("red3", "black", "green3"))(50)``

.. code:: python

    pheatmap(hits, show_rownames = FALSE, color = colorRampPalette(c("red3", "black", "green3"))(50))



.. image:: PracticeFOURSolutions_files/PracticeFOURSolutions_10_0.png
   :width: 600



.. code:: python

    a1 <- 1
    a2 <- 2
    sigma1 <- 25
    sigma2 <- 25
    
    subject <- paste("PID", rep(1:2, each=5))
    dose <- rep(seq(10, 100, 20), 2)
    geneA <- a1 * dose + rnorm(length(dose), sd=sigma1)
    geneB <- a2 * dose + rnorm(length(dose), sd=sigma2)
    
    df <- data.frame(subject=subject, dose=dose, geneA=geneA, geneB=geneB)

.. code:: python

    df




.. raw:: html

    <table>
    <thead><tr><th></th><th scope=col>subject</th><th scope=col>dose</th><th scope=col>geneA</th><th scope=col>geneB</th></tr></thead>
    <tbody>
    	<tr><th scope=row>1</th><td>PID 1</td><td>10</td><td>-3.367338</td><td>-31.18716</td></tr>
    	<tr><th scope=row>2</th><td>PID 1</td><td>30</td><td>54.03613</td><td>91.78636</td></tr>
    	<tr><th scope=row>3</th><td>PID 1</td><td>50</td><td>22.40099</td><td>124.477</td></tr>
    	<tr><th scope=row>4</th><td>PID 1</td><td>70</td><td>73.61274</td><td>111.6039</td></tr>
    	<tr><th scope=row>5</th><td>PID 1</td><td>90</td><td>84.05749</td><td>188.5058</td></tr>
    	<tr><th scope=row>6</th><td>PID 2</td><td>10</td><td>-13.63179</td><td>12.04911</td></tr>
    	<tr><th scope=row>7</th><td>PID 2</td><td>30</td><td>40.03472</td><td>57.40407</td></tr>
    	<tr><th scope=row>8</th><td>PID 2</td><td>50</td><td>64.98747</td><td>79.04738</td></tr>
    	<tr><th scope=row>9</th><td>PID 2</td><td>70</td><td>91.22307</td><td>147.4357</td></tr>
    	<tr><th scope=row>10</th><td>PID 2</td><td>90</td><td>70.94818</td><td>166.8696</td></tr>
    </tbody>
    </table>




**Exercise 3**

Using ``ggplot2``, plot gene expression agaisnt does, using differnt
colors for differetn genes and differnet shapes for different subjects.
(Hint: The ``reshape2`` library may come in useful)

.. code:: python

    md <- melt(df, id=c("subject", "dose"))

.. code:: python

    md




.. raw:: html

    <table>
    <thead><tr><th></th><th scope=col>subject</th><th scope=col>dose</th><th scope=col>variable</th><th scope=col>value</th></tr></thead>
    <tbody>
    	<tr><th scope=row>1</th><td>PID 1</td><td>10</td><td>geneA</td><td>-3.367338</td></tr>
    	<tr><th scope=row>2</th><td>PID 1</td><td>30</td><td>geneA</td><td>54.03613</td></tr>
    	<tr><th scope=row>3</th><td>PID 1</td><td>50</td><td>geneA</td><td>22.40099</td></tr>
    	<tr><th scope=row>4</th><td>PID 1</td><td>70</td><td>geneA</td><td>73.61274</td></tr>
    	<tr><th scope=row>5</th><td>PID 1</td><td>90</td><td>geneA</td><td>84.05749</td></tr>
    	<tr><th scope=row>6</th><td>PID 2</td><td>10</td><td>geneA</td><td>-13.63179</td></tr>
    	<tr><th scope=row>7</th><td>PID 2</td><td>30</td><td>geneA</td><td>40.03472</td></tr>
    	<tr><th scope=row>8</th><td>PID 2</td><td>50</td><td>geneA</td><td>64.98747</td></tr>
    	<tr><th scope=row>9</th><td>PID 2</td><td>70</td><td>geneA</td><td>91.22307</td></tr>
    	<tr><th scope=row>10</th><td>PID 2</td><td>90</td><td>geneA</td><td>70.94818</td></tr>
    	<tr><th scope=row>11</th><td>PID 1</td><td>10</td><td>geneB</td><td>-31.18716</td></tr>
    	<tr><th scope=row>12</th><td>PID 1</td><td>30</td><td>geneB</td><td>91.78636</td></tr>
    	<tr><th scope=row>13</th><td>PID 1</td><td>50</td><td>geneB</td><td>124.477</td></tr>
    	<tr><th scope=row>14</th><td>PID 1</td><td>70</td><td>geneB</td><td>111.6039</td></tr>
    	<tr><th scope=row>15</th><td>PID 1</td><td>90</td><td>geneB</td><td>188.5058</td></tr>
    	<tr><th scope=row>16</th><td>PID 2</td><td>10</td><td>geneB</td><td>12.04911</td></tr>
    	<tr><th scope=row>17</th><td>PID 2</td><td>30</td><td>geneB</td><td>57.40407</td></tr>
    	<tr><th scope=row>18</th><td>PID 2</td><td>50</td><td>geneB</td><td>79.04738</td></tr>
    	<tr><th scope=row>19</th><td>PID 2</td><td>70</td><td>geneB</td><td>147.4357</td></tr>
    	<tr><th scope=row>20</th><td>PID 2</td><td>90</td><td>geneB</td><td>166.8696</td></tr>
    </tbody>
    </table>




.. code:: python

    ggplot(md, aes(x=dose, y=value, color=variable, shape=subject)) + 
    geom_point(size=5)



.. image:: PracticeFOURSolutions_files/PracticeFOURSolutions_16_0.png
   :width: 600



**Exercise 4**

Usign ``ggplot2``, make a grid of plots with separae colummsn for each
subject and separate rows for each gene. Vary the color and size by the
gene expression value. Add a linear regression fit with 95% confidence
intervals to each plot.

.. code:: python

    ggplot(md, aes(x=dose, y=value, color=value)) + 
    geom_point(size=3) +
    geom_smooth(method="lm") +
    facet_grid(variable ~ subject)



.. image:: PracticeFOURSolutions_files/PracticeFOURSolutions_18_0.png
   :width: 600