R/Bioconductor Basics Workshop (2-day)
When: Oct. 22nd, 2015 - Oct. 23rd, 2015 9:30 am - 4:30 pm
To Know
About this Class
A Short Course in R for Biologists
"A Short Course in R for Biologists" is a two-day course given in four three-hour sessions entitled: Introduction to R, Introduction to Bioconductor, Introduction to Microarray Analysis, and Introduction to NGS Data Analysis.
Day | Morning Session, 9:30 AM-12:30 PM | Afternoon Session, 1:30 PM-4:30 PM |
---|---|---|
Oct 22 | Introduction to R | Introduction to Bioconductor |
Oct 23 | Introduction to Microarray Analysis | Introduction to NGS Data Analysis |
PLEASE NOTE: This 2 day workshop is a BYOC (Bring your own laptop Computer) class. Government issued or personal computers are permitted. We will be able to supply a very limited set of computers, so if you want to take the class but cannot bring your own computer please indicate such in the Comment section on the registration form.
Web-based resources for this class: (See Below for PDF versions)
- Introduction to R for Biologists (David Wheeler)
- Introduction to Bioconductor (David Wheeler)
- Introduction to R (Sean Davis)
- Vignettes (Sean Davis)
- Data Files (Fathi Elloumi)
- R script (Fathi Elloumi)
The course will include frequent, short hands-on periods so students should bring their own laptops with a working installation of R, version 3.1 or later. In addition, several R packages will be used which must be installed prior to the course.
R is a console application. Students who prefer a more graphically-oriented working environment will find that using RStudio as an environment in which to run R makes life much easier. If you are comfortable running programs, viewing output, and editing files at the terminal, you will not need RStudio in order to take the course. However, RStudio offers quite an array of functions that you may still find useful and it is well worth a look.
R Installation
The R program and instructions for its installation under Linux, Mac OSX, and Windows can be found here:
Bioconductor and Bioconductor Package Installation
Complete instructions for the installation of the basic and additional Bioconductor packages are found here:
In addition to the basic Bioconductor package, please install these additional Bioconductor packages prior to the start of the class:
Biostrings | BSgenome | BSgenome.Celegans.UCSC.ce6 |
TxDb.Celegans.UCSC.ce6.ensGene | GenomicFeatures | GenomicRanges |
GenomicAlignments | TxDb.Hsapiens.UCSC.hg19.knownGene | affy |
simpleaffy | arrayQualityMetrics | limma |
survival | ggplot2 | hthgu133acdf |
hthgu133a.db | gplots |
Briefly, the following code, executed from within an R session, should serve to install the basic Bioconductor package as well as the additional packages listed above:
# First, download the Bioconductor installer, biocLite()
source("http://bioconductor.org/biocLite.R")
# Now, use the installer to install several packages at once
# The base package, Biobase, will be installed automatically
biocLite(pkgs=c("Biostrings", "BSgenome", "BSgenome.Celegans.UCSC.ce6", "TxDb.Celegans.UCSC.ce6.ensGene", "GenomicFeatures", "GenomicRanges", "GenomicAlignments", "TxDb.Hsapiens.UCSC.hg19.knownGene","affy","simpleaffy","arrayQualityMetrics","limma","survival","ggplot2","hthgu133acdf","hthgu133a.db","gplots"))
RStudio Installation
Install the "Desktop, Open Source Edition":
Class Outline
Day 1 (Oct 22), Morning Session: Introduction to R
- The R environment
- Starting an R Session, Setting Options
- Listing Variables, Editing Commands, Using the R History
- Getting Help on an R Function
- Logging a Session to a File
- Running External R Code
- Installing and Loading Packages
- Ending a Session, Saving Your Work
- The Elements of R
- Numeric
- Character
- Logical
- Missing Values
- R Data Structures
- Vectors
- Matrices
- Lists
- Data.Frames
- Factors
- Functions
- Other Complex Structures
- Procedures
- Reading and Writing Data
- Exploring and Summarizing Data
- Dealing with Missing Data
- Restructuring Data
- Relabeling Data
- Subsetting Data
- Operating on Rows or Columns of Data
- Saving R Objects for Later Use
- Graphing Data
- Simple Statistical Tests
- Example: A Simple Analysis of Probe Intensity Data
- Project: Creating a Graphical Function in 4 Easy Steps
- Step 1: Create an X-Y Plot to Compare Two Arrays
- Step 2: Package the X-Y Plot as a Function
- Step 3: Create a Median Array as a Better Standard for Comparison
- Step 4: Rotate and Scale the Plot-Voila, You Have Created a MAPlot!
Day 1 (Oct 22), Afternoon Session: Introduction to Bioconductor
- Installing Bioconductor
- An Overview of Bioconductor Packages
- Fundamental Packages
- Biobase: the Foundation
- Biostrings: A Representation of Biological Sequences
- BSgenome: A Representation of Complete Genomic Sequences
- GenomicRanges: Manipulation of Genomic Intervals
- GenomicFeatures: Manipulation of Genomic Features
- GenomicAlgnments: Manipulation of Short Genomic Alignments
- Two Fundamental Structures to Contain Experiment Data
- The ExpressionSet for Array Data
- Constructing an ExpressionSet
- Analyzing an ExpressionSet
- The SummarizedExperiment for NGS Sequence Data
- Constructing a SummarizedExperiment
- Analyzing a SummarizedExperiment
- The ExpressionSet for Array Data
Day 2 (Oct 23), Morning Session: Introduction to Microarray Analysis
The objective of this session is to initiate students in the analysis of microarrays using R and Bioconductor. To better help students take advantage of the microarray services offered by the Laboratory of Molecular Technology at NCI-Frederick, the focus of the course will be on the analysis of data from Affymetrix chips. It is assumed that the student has some knowledge of microarray workflows.
- Downloading Data from The Cancer Genome Atlas Databases
- Preliminary Steps: Array Pre-Processing
- Checking the Quality of Arrays
- Performing Array Normalization
- Identifying Differentially Expressed Genes
- Data Visualization
- Performing Principal Component Analysis (PCA)
- Computing and Interpreting Heatmaps
- Computing and Interpreting Kaplan Meir Curves