General Overview
R (http://cran.at.r-project.org\) is a comprehensive statistical environment and programming language for professional data analysis and graphical display. The associated Bioconductor project provides many additional R packages for statistical data analysis in different life science areas, such as tools for microarray, next generation sequence and genome analysis. The R software is free and runs on all common operating systems.
Scope of this Manual
This R tutorial provides a condensed introduction into the usage of the R environment and its utilities for general data analysis and clustering. It also introduces a subset of packages from the Bioconductor project. The included packages are a 'personal selection' of the author of this manual that does not reflect the full utility specturm of the R/Bioconductor projects. Many packages were chosen, because the author uses them often for his own teaching and research. To obtain a broad overview of available R packages, it is strongly recommended to consult the official Bioconductor and R project sites. Due to the rapid development of most packages, it is also important to be aware that this manual will often not be fully up-to-date. Because of this and many other reasons, it is absolutely critical to use the original documentation of each package (PDF manual or vignette) as primary source of documentation. Users are welcome to send suggestions for improving this manual directly to its author.
Format of this Manual
A not always very easy to read, but practical copy & paste format has been chosen throughout this manual. In this format all commands are represented in code boxes, where the comments are given in blue color. To save space, often several commands are concatenated on one line and separated with a semicolon ';'. All comments/explanations start with the standard comment sign '#' to prevent them from being interpreted by R as commands. This way several commands can be pasted with their comment text into the R console to demo the different functions and analysis steps. Commands starting with a '$' sign need to be executed from a Unix or Linux shell. Windows users can simply ignore them. Commands highlighted in red color are considered essential knowledge. They are important for someone interested in a quick start with R and Bioconductor. Where relevant, the output generated by R is given in green color.
Installation of the R Software and R Packages
The installation instructions are provided in the Administrative Section of this manual.
R working environments with syntax highlighting support and utilities to send code to the R console:
RStudio: excellent choice for beginners (Cheat Sheet)
- Basic R code editors provided by Rguis
- gedit, Rgedit, RKWard, Eclipse, Tinn-R, Notepad++ (NppToR)
- Vim-R-Tmux: R working environment based on vim and tmux
- Emacs (ESS add-on package)
R Projects and Interfaces
Basic R Usage
$ R # Starts the R console under Unix/Linux. The R GUI versions under Windows and Mac OS X can be started by double-clicking their icons.
object <- function(arguments) # This general R command syntax uses the assignment operator '<-' (or '=') to assign data generated by a command to its right to object on its left.
object = function(arguments) # A more recently introduced assignment operator is '='. Both of them work the same way and in both directions. For consistency reasons one should use only one of them.
assign("x", function(arguments)) # Has the same effect, but uses the assignment function instead of the assignment operator.
source("my_script") # Command to execute an R script, here 'my_script'. For example, generate a text file 'my_script' with the command 'print(1:100)', then execute it with the source function.
x <- edit(data.frame()) # Starts empty GUI spreadsheet editor for manual data entry.
x <- edit(x) # Opens existing data frame (table) 'x' in GUI spreadsheet editor.
x <- scan(w="c") # Lets you enter values from the keyboard or by copy&paste and assigns them to vector 'x'.
q() # Quits R console.
Table of Contents
R Startup Behavior
The R environment is controlled by hidden files in the startup directory: .RData, .Rhistory and .Rprofile (optional)