Bioinformatics Training and Education Program

Workshop on TCGA Data Mining

Workshop on TCGA Data Mining

 When: Mar. 18th, 2014 - Mar. 19th, 2014 9:30 am - 5:00 pm

To Know

Bldg 12A, Room B51, Bethesda, MD
Presented By:
Maxwell Lee (CCR, NCI)
This class has ended.

About this Class

The Cancer Genome Atlas (TCGA) is a large-scale study that has catalogued genomic data accumulated from more than 20 different types of cancer including mutations, copy number variation, mRNA and miRNA gene expression, and DNA methylation.  Being publicly distributed, it has become a major resource for cancer researchers in target discovery and in the biological interpretation and assessment of the clinical impact of genes of interest.  This 2 day workshop will familiarize the audience with the types of data available and analytical tools, including a number of software packages, that enable end-users to easily and effectively mine TCGA data.

Day 1 - Tuesday March 18th 9:30-11:30 am
Introductory Lecture to TCGA Data Analysis
(Maxwell Lee, PhD - CCR NCI)

  1. Introduction
    • A brief history
    • Overview of TCGA data
  2.  Discussion of three TCGA papers
    • Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma.
    • Cancer Cell. 2010 May 18;17(5):510-22. 
    • Comprehensive molecular portraits of human breast tumours.
    • Nature. 2012 Oct 4;490(7418):61-70. 
    • Discovery and saturation analysis of cancer genes across 21 tumour types.
      Nature. 2014 Jan 23;505(7484):495-501. 
  3. Using TCGA data
  4. Where to download the data?
  5. Some case studies of data analyses

Day 1 - Tuesday March 18th 11:30-12:30 pm
cBioPortal Demo
(Anand Merchant, PhD, CCRIFX)
This publicly accessible web-based resource provides visualization, analysis and download of large-scale cancer genomics data sets.
As of early 2014 the Portal contains data for 15506 tumor samples from 56 cancer studies. This presentation will include:

  • Introduction to the web application – mission and evolving goals – What is the purpose?
  • Website walk-through – Where is the information and how to query it?
  • Review of the Cancer and Data Types available in the underlying cBio database
  • Advantages and Limitations
  • OncoQueryLanguage (OQL) - Key words and Codes
  • Features and Analytics
  • Viewing and Interpretation of results
  • Example Case  with TCGA dataset (Breast Cancer – 2012 Nature publication)
  • References/Tutorials/FAQ/Pre-set queries
  • Q&A

Day 1 - Tuesday March 18th 2:00-5:00 pm
TCGA Data mining using Qlucore (emphasis on expression/methylation)
(Carl-Johan Ivarsson, MSc - Qlucore)
Qlucore Omics Explorer is a user-friendly and interactive software program for data visualization and analysis of any large numerical data set, especially developed for biologists. Through a straightforward user interface built on sliders and check-boxes the users get the possibility to explore and analyze very large data sets.With Qlucore Omics Explorer it is easy to investigate data and evaluate key biological information directly on screen, results are achieved immediately with only a few mouse-clicks. It is possible to work with multiple data sets and the users can introduce as many annotations and clinical parameters as they want – no limits.
In this workshop you will learn how to use Qlucore Omics Explorer to mine TCGA data. Focus will be on working with two data sets and how to find relationships between gene expression and DNA methylation data.

Learning Objectives:

  • Import data and clinical annotations from TCGA
  • Create new hypotheses and new findings using interactive visualization including PCA and heatmaps
  • Learn how to focus the data mining by using interactive selections and statistical filters
  • Work with both gene expression and DNA methylation data in an integrated manner
  • Generate plots and lists for easy publication

Day 2 -Wednesday March 19th 9:30-12:30 pm
BioDiscovery Nexus: TCGA data analysis using Nexus DB (emphasis on Copy Number/mutation)
(Andrea O Hara, PhD, Field Appliction Sceintist, BioDiscovery)
Nexus Copy Number is a platform independent copy number analysis and visualization tool that includes co-visualization of sequence variants. NCI’s site license now includes unlimited access to TCGA Premier, a database of re-processed, curated and reviewed TCGA samples.  The Nexus Copy Number training session will include:

  1. Approaches to optimizing CNV calling from array data.
  2. Downstream analysis of data sets, including:
    • Visualization and statistical approaches for CNV discovery.
    • Stratification by clinical annotation factors or biomarkers.
    • Finding CNVs predictive of survival or other outcome data.
  3. TCGA Premier Data Access:
    • How to access of CNV TCGA data directly from Nexus
    • Query and Integration of TCGA CNV tumor profiles

Day 2 - Wednesday March 19th 2:00-5:00 pm
Oncomine: TCGA data analysis (expression, CN, mutation analysis)
(Matthew Anstett, Sr. Market Development Manager)

Oncomine™ Research Edition is a free powerful web application that integrates and unifies high-throughput cancer profiling data so that target expression across a large number of cancer types and experiments can be accessed online, in seconds. Oncomine™ Research Edition includes annual data updates and basic analysis types such as cancer vs. normal, multi-cancer, and co-expression. It features gene and concept summaries, outlier analysis, meta-analysis, and meta-cancer outlier profile analysis (COPA). Oncomine™ Research Premium Edition is a subscription-based software tool for academic researchers that provides additional advanced features and analyses over Oncomine™ Research Edition (


This presentation will include the following topics:


      Oncomine Research Premium Edition

  • Advanced differential expression analysis
  • Cancer Outlier Profile Analysis (COPA)
  • Signature mapping
  • Import/export of findings


      Oncomine Gene Browser

  • Mutation frequencies and gain/loss of function prediction
  • DNA Copy frequencies in cancer
  • Gene expression cancer panel
  • Identifying cell line models