Microarray Workshop (2 day)
When: Sep. 22nd, 2015 - Sep. 23rd, 2015 9:30 am - 4:30 pm
To Know
About this Class
Learn the basics of microarray gene expression analysis using Partek Genomics Suite and Open Source Tools. As we walk though hands-on analysis of a cancer dataset, you will learn the principles of experimental design, batch correction, statistics, and how to extract biological meaning from the results using tools geneset analyses and pathways.
PLEASE NOTE: This 2 day workshop is a BYOC (Bring your own LapTop Computer) class. Government issued or personal computers are permitted. We will be able to supply a very limited set of computers, so if you want to take the class but cannot bring your own computer please indicate such in the Comment section on the registration form.
Direction of FAES Classroom 7 (B1C206) can be found here: http://www.faes.org/announcements/directions_faes_classrooms_nih_campus
Day 1 - AM (9:30-11:30) Introductory Lecture
(Maggie Cam, PhD - CCR, NCI)
Introduction
- Historical Perspective
- Microarray Technologies, Sample Processing Methods
- Microarray comparisons to RNA-Seq
Data Analysis
- Experimental Design
- QC methods
- Preprocessing: Normalization and low level analysis algorithms
Statistical Analysis
- Common statistical models used for analysis of microarray data
- Examples of blocking
- Batch effects and removal methods
Visualization and Clustering
- Volcano Plot
- Principal Components Analysis
- Hierarchical Clustering
- K-means Clustering
Validation and Downstream Analysis
- Validation methods
- Gene Ontology Enrichment and Pathway analysis tools
- Major Software applications
- Public Repositories of Microarray Data
Day 1 - PM (2:00-4:30 pm): Hands-on Gene Expression Data Analysis in Partek Genomics Suite
(Xiaowen Wang, PhD - Partek)
Attendees will learn how to use basic features of Partek Genomics Suite for the analysis on Gene Expression Data. An Affymetrix Gene Expression Data will be used to conduct Gene Expression workflow:
- Import data
- Perform QA/QC of imported data
- Exploratory data analysis
- Detect differential expression (ANOVA)
- Gene list creation
Day 2 - AM (9:30-11:30): Hands-on Gene Expression Data Analysis in Partek Genomics Suite - Continued
(Xiaowen Wang, PhD - Partek)
- Biological interpretation
- Visualization (PCA, histogram, box plot, dot plot, volcano plot, interaction plot heatmap etc.)
Day 2 - PM (1:30-2:30): GEO2R
(Parthav Jailwala, MSc- CCBR, NCI)
GEO2R is an interactive web tool that allows users to compare two or more groups of samples in a GEO Series in order to identify genes that are differentially expressed across experimental conditions. GEO2R performs comparisons on original submitter-supplied processed data tables using the GEOquery and limma R packages from the Bioconductor project. Bioconductor is an open source software project based on the R programming language that provides tools for the analysis of high-throughput genomic data. The GEOquery R package parses GEO data into R data structures that can be used by other R packages. The limma (Linear Models for Microarray Analysis) R package has emerged as one of the most widely used statistical tests for identifying differentially expressed genes. It handles a wide range of experimental designs and data types and applies multiple-testing corrections on P-values to help correct for the occurrence of false positives. Thus, GEO2R provides a simple interface that allows users to perform R statistical analysis without command line expertise.
Lecture
- Background on GEO datasets
- What is GEO2R and how can it help you
- How to use GEO2R
- Options and features
- Limitations and caveats
- Hands-on exercise
Day 2 - PM (2:30-3:30): DAVID
(David/Dawei Huang, M.D. - LMB, CCR, NCI)
The Database for Annotation, Visualization and Integrated Discovery (DAVID ) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.
Lecture
- Brief principle of DAVID gene enrichment analysis
- Term-centric analysis of a large gene list
- Gene-centric analysis of a large gene list
- Pathway map view of a large gene list
- Nature Protocols 4:44 (http://www.nature.com/nprot/journal/v4/n1/abs/nprot.2008.211.html)
Day 2 - PM (3:30-4:30): GeneSet Enrichment Analysis (GSEA)
(Maggie Cam, PhD - CCR, NCI)
GSEA is a computational method that determines which (if any) a priori defined sets of genes are significantly differentially expressed, as an ensemble, between two biological states. It is an open-source program developed by the Broad Institute: http://www.broadinstitute.org/gsea/index.jsp
Lecture
- The general approach of gene set enrichment methods and comparison with DAVID
- How GSEA measures differential expression for each set of genes
- Controlling effects of multiple comparisons in GSEA (false discovery rate)
- The Broad Institute library of groups of gene sets (MSigDB)
- What files and formats are needed for GSEA
- User options and running GSEA
Hands-on
- Loading the GSEA required input files for an example dataset
- Using and choosing values in the GSEA GUI interface
- Rank-based analysis
- Full dataset analysis
- Understanding the GSEA outputs and judging significance in the results