The first thing you should do with this notebook is make a copy! Go to the file menu and choose ‘Make a Copy’.
Great! Now you should be working in a file called ‘BasicRinJupyterAndRstudio-Copy1’
Briefly go back to the other tab and choose ‘Close and Halt’ from the file menu. Why did we do this? Because we have the lecture materials in something called a ‘github repository’. We may need to make changes to notebooks and if and when we do, we will show you how to update your VM with the new materials. We want whatever changes you make to be saved under different files and this seems to be the easiest way. (Note: even if you don’t think you are making changes, the notebook ‘autosaves’ - so your notebook will almost always be considered ‘changed’ as far as github is concerned... anyway - just trust me!
R is a programming environment created specifically for statistics. It is a scripting language (if you don’t know what that means, don’t worry for now). R can be used interactively (as we will see in this notebook), or it can be told to execute a list of commands stored in a plain text file (called a ‘script’).
From within the Jupyter notebook, we can access the R ‘kernel’ (the program that interprets R code and returns results). This is just one way to use R. We will also learn to use a program called Rstudio.
As you can see, the notebook is browser based (it opens a window in your browser) and works a lot like a web server. This notebook is running an R kernel, but we could chooose a Python kernel, a bash kernel (unix shell) or from a long list that is currently expanding:
https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages
Be careful, though. Much of this is under development and considered ‘beta’ (or even alpha) - the tools can be buggy. We will be careful to use only the better developed parts of the Jupyter universe.
The notebook is comprised of ‘cells’. Cells can either be ‘code’ or ‘markdown’. Code is for writing R commands. Markdown is for text and is an extension of html. This cell is a markdown cell.
More about markdown here:
https://en.wikipedia.org/wiki/Markdown
You can type anything you want in markdown (though there are some special characters that will be interpreted as commands).
Code cells require R syntax. The following cell is an R cell:
# This is an R cell. The '#' tells R this is a comment
3+1
When I run the code cell, it executes the R code (3+1) and returns the output in an output cell. To run a code cell, you can type shift-enter, or press the run button at the top of the screen.
In any programming language, we have the notion of ‘data modes’ and ‘data structures’. This is because programs manipulate data, and different kinds of data require different manipulations. For example, numbers are treated differently than characters (or strings of characters) and single numbers are treated differently than lists of numbers (vectors) or arrays of numbers (matrices).
The following are some simple R data modes:
Modes can be combined to form data structures:
class(c(1,3,2,8.4)) #This is a vector
class(matrix(c(1,3,2,4,5,6),nrow=2,ncol=3)) #This is a matrix
"This is a string!"
data.frame(c("This is a string","This is another string"),matrix(c(1:6),nrow=2,ncol=3))
c..This.is.a.string....This.is.another.string.. | X1 | X2 | X3 | |
---|---|---|---|---|
1 | This is a string | 1 | 3 | 5 |
2 | This is another string | 2 | 4 | 6 |
The important thing to note above is the combination of both character and numeric data into one object! That is what is special about data frames.
We have just seen this command in action. The ‘c’ command combines objects by concatenation. For example:
c(5,6,7)
creates a vector of length 3. We can append to that vector, like so:
c(c(5,6,7),8)
Of course, we would usually have named the first vector something:
v1<-c(5,6,7)
v2<-c(v1,8)
print(v1)
print(v2)
[1] 5 6 7
[1] 5 6 7 8
Now, if we would like to create a matrix, we could use the matrix command as above:
matrix(c(1,3,2,4,5,6),nrow=2,ncol=3)
1 | 2 | 5 |
3 | 4 | 6 |
Or, we could create two vectors and combine them:
v1<-c(1,2,5)
v2<-c(3,4,6)
m1<-rbind(v1,v2)
m1
class(m1)
v1 | 1 | 2 | 5 |
---|---|---|---|
v2 | 3 | 4 | 6 |
Notice that R has automatically assigned row names for us. Thank you, R! We can also use the column-based version (rbind means ‘row bind’) to append a column to a matrix:
cbind
m1<-matrix(c(1,2,3,4),nrow=2,ncol=2)
m1
1 | 3 |
2 | 4 |
m2<-cbind(m1,c(5,6))
m2
Error in cbind(m1, c(5, 6)): object 'm1' not found
Error in eval(expr, envir, enclos): object 'm2' not found
If you know the name of the command you want to use, you can just type ‘? command_name’ in a code cell and run it, like so:
?help
help {utils} | R Documentation |
help
is the primary interface to the help systems.
help(topic, package = NULL, lib.loc = NULL, verbose = getOption("verbose"), try.all.packages = getOption("help.try.all.packages"), help_type = getOption("help_type"))
topic |
usually, a name or character string specifying the topic for which help is sought. A character string (enclosed in explicit single or double quotes) is always taken as naming a topic. If the value of See ‘Details’ for what happens if this is omitted. |
package |
a name or character vector giving the packages to look
into for documentation, or |
lib.loc |
a character vector of directory names of R libraries,
or |
verbose |
logical; if |
try.all.packages |
logical; see |
help_type |
character string: the type of help required.
Possible values are |
The following types of help are available:
Plain text help
HTML help pages with hyperlinks to other topics, shown in a
browser by browseURL
.
(Where possible an existing browser window is re-used: the OS X
GUI uses its own browser window.)
If for some reason HTML help is unavailable (see
startDynamicHelp
), plain text help will be used
instead.
For help
only, typeset as PDF –
see the section on ‘Offline help’.
The ‘factory-fresh’ default is text help except from the OS X GUI, which uses HTML help displayed in its own browser window.
The rendering of text help will use directional quotes in suitable
locales (UTF-8 and single-byte Windows locales): sometimes the fonts
used do not support these quotes so this can be turned off by setting
options(useFancyQuotes = FALSE)
.
topic
is not optional: if it is omitted R will give
If a package is specified, (text or, in interactive use only, HTML) information on the package, including hints/links to suitable help topics.
If lib.loc
only is specified, a (text) list of available
packages.
Help on help
itself if none of the first three
arguments is specified.
Some topics need to be quoted (by backticks) or given as a
character string. These include those which cannot syntactically
appear on their own such as unary and binary operators,
function
and control-flow reserved words (including
if
, else
for
, in
, repeat
,
while
, break
and next
). The other reserved
words can be used as if they were names, for example TRUE
,
NA
and Inf
.
If multiple help files matching topic
are found, in interactive
use a menu is presented for the user to choose one: in batch use the
first on the search path is used. (For HTML help the menu will be an
HTML page, otherwise a graphical menu if possible if
getOption("menu.graphics")
is true, the default.)
Note that HTML help does not make use of lib.loc
: it will
always look first in the loaded packages and then along
.libPaths()
.
Typeset documentation is produced by running the LaTeX version of the
help page through pdflatex
: this will produce a PDF file.
The appearance of the output can be customized through a file
‘Rhelp.cfg’ somewhere in your LaTeX search path: this will be
input as a LaTeX style file after Rd.sty
. Some
environment variables are consulted, notably R_PAPERSIZE
(via getOption("papersize")
) and R_RD4PDF (see
‘Making manuals’ in the ‘R Installation and
Administration Manual’).
If there is a function offline_help_helper
in the workspace or
further down the search path it is used to do the typesetting,
otherwise the function of that name in the utils
namespace (to
which the first paragraph applies). It should accept at least two
arguments, the name of the LaTeX file to be typeset and the type
(which is nowadays ignored). It accepts a third argument,
texinputs
, which will give the graphics path when the help
document contains figures, and will otherwise not be supplied.
Unless lib.loc
is specified explicitly, the loaded packages are
searched before those in the specified libraries. This ensures that
if a library is loaded from a library not in the known library trees,
then the help from the loaded library is used. If lib.loc
is
specified explicitly, the loaded packages are not searched.
If this search fails and argument try.all.packages
is
TRUE
and neither packages
nor lib.loc
is
specified, then all the packages in the known library trees are
searched for help on topic
and a list of (any) packages where
help may be found is displayed (with hyperlinks for help_type =
"html"
). NB: searching all packages can be slow, especially
the first time (caching of files by the OS can expedite subsequent
searches dramatically).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
?
for shortcuts to help topics.
help.search()
or ??
for finding help pages
on a vague topic;
help.start()
which opens the HTML version of the R
help pages;
library()
for listing available packages and the
help objects they contain;
data()
for listing available data sets;
methods()
.
Use prompt()
to get a prototype for writing help
pages of your own package.
help() help(help) # the same help(lapply) help("for") # or ?"for", but quotes/backticks are needed try({# requires working TeX installation: help(dgamma, help_type = "pdf") ## -> nicely formatted pdf -- including math formula -- for help(dgamma): system2(getOption("pdfviewer"), "dgamma.pdf", wait = FALSE) }) help(package = "splines") # get help even when package is not loaded topi <- "women" help(topi) try(help("bs", try.all.packages = FALSE)) # reports not found (an error) help("bs", try.all.packages = TRUE) # reports can be found # in package 'splines' ## For programmatic use: topic <- "family"; pkg_ref <- "stats" help((topic), (pkg_ref))
If you want to do something, but don’t know the command in R, Google can be a great tool! Go ahead and google how to create a histogram in R.
Sure, we know we need to work to learn new things - but let’s not underestimate the power of play! Take a few moments to play in this new sandbox. Create some vectors, matrices, strings, etc. What can you do? Can you figure out how to make R multiply a matrix times (an appropriately sized) vector? Multiply two matrices? What happens if you add two vectors? Multiply? Do you get the answer you expect?
# Start here - you can work within the lecture notes! How cool is that?
Now, the notebook is a great environment, especially for doing reproducible research, documenting all your steps and keeping track of things during exploratory analysis. There is another R interface available, and it is much more ‘mature’ than the Jupyter project (mature does not mean ‘better’ - just that some features available in R and Rstudio may not yet be incorporated in the Jupyter R kernel).
Your VM has been setup as an Rstudio ‘server’. This means that you can connect to it in a similar manner as you did to the notebook server. Use the following URL:
http://colab-sbx-XXX.oit.duke.edu:8787
Rstudio Server uses the system login authentication, so type in your VM username (bitnami) and the password you were assigned by Duke’s OIT.
Your window should look like so:
In the upper right corner, we have a ‘script’ window. This is where you type code that you would like to save into a file. In the lower right is a console window. This is where you type code to execute. It is just like the command line in unix. You type code, press enter, and it gets executed.
On the right hand side is a window with tabs for Files, Plots, Packages, Help and Viewer. As we are beginners, the ‘help’ tab will be the most relevant. Here, we can find information on syntax, what functions do what, tutorials, etc.