Skip to content

Bioinformatics for Beginners: RNA-Seq

Course Description

Bioinformatics integrates biology, statistics, and computer science to develop and apply theory, methods, and tools for the collection, storage, and analysis of biological and related data. Some key application areas in bioinformatics include genomic and molecular analysis, drug discovery and development, medical diagnosis and treatment, agricultural biotechnology, and environmental monitoring. The National Cancer Institute (NCI) uses bioinformatics extensively in its research efforts to combat cancer, including research on the "origin, evolution, progression, and treatment of cancer".

This course was designed to teach the basic skills needed for bioinformatics, including working on the Unix command line. This course primarily focuses on RNA-Seq analysis. All steps of the RNA-Seq workflow, from raw data to differential expression and gene ontology analysis, are covered. However, many of the skills learned are foundational to most bioinformatics analyses and can be applied to the analysis of other types of next generation sequencing experiments.

Why learn bioinformatics?

Here are a few compelling reasons to explore the world of bioinformatics:

  • Analyze your data: Empower yourself to delve into your own biological data, gaining valuable insights.

  • Enhancing Scientific Skills: Broaden your scientific knowledge and skills by mastering bioinformatics tools and techniques. By understanding the principles involved with data collection and analysis, you'll be better equipped to design robust experiments and interpret their results effectively.

  • Career Opportunities: Open doors to exciting career paths in the rapidly growing field of bioinformatics.

  • Understand the Data Landscape: Gain a deeper appreciation for how others analyze biological data, fostering collaboration and critical thinking.


This course is divided into three modules.

Module 1: Unix and Biowulf

Lessons focus on developing command line skills, getting started and working on Biowulf (the NIH HPC cluster), and downloading data from NCBI.

Module 2: RNA-Seq Analysis

Lessons focus on RNA-Seq analysis including experimental design and best practices, quality control, trimming, alignment based methods, feature counts, differential expression analysis, and biological interpretation.

  • Lesson 6 - Introduction to RNA-Seq
  • Lesson 7 - Introduction to Next Generation Sequencing (NGS) Data and Quality Control
  • Lesson 8 - Cleaning and Preparing Next Generation Sequencing (NGS) Data for Downstream Analysis
  • Lesson 9 - Aligning Next Generation Sequencing (NGS) Data to Genome
  • Lesson 10 - Quantifying Gene Expression from RNA Sequencing Data
  • Lesson 11 - Visualizing Genomic Data: Preparing Files
  • Lesson 12 - Visualizing Genomic Data with the Integrative Genomics Viewer
  • Lesson 13: Differential Expression Analysis: QC
  • Lesson 14: Differential Expression Analysis for Bulk RNA Sequencing: The Actual Analysis

Module 3: Pathway Analysis

Lessons focus on gene ontology and pathway analysis.

  • Lesson 15: Introduction to gene ontology and pathway analysis
  • Lesson 16: Functional enrichment with DAVID
  • Lesson 17: Pathway Analysis with Reactome

Course requirements:

Who can take this course?

There are no prerequisites to take this course. This course is open to NCI-CCR researchers interested in learning bioinformatics skills, especially those relevant to analyzing bulk RNA sequencing data.

How will we work through lesson content?

For the hands-on sessions, participants will use Biowulf student accounts. To sign up for a student account, click here. Student accounts are only available to course registrants.

Lesson content and practice questions can be found in these pages.

Class documents are available at https://bioinformatics.ccr.cancer.gov/docs/bioinformatics-for-beginners-2025/.

Class data

Below are the links for the class data in case participants would like to practice outside of and after this course series. There is no need to download these for this course as the instructors have made them available on Biowulf. If you do not have access to Biowulf, see the below instructions.

Module 1

You can find compressed Module 1 data here. Download the data and unzip.

unzip module_1.zip  

Module 2

All Module 2 data were obtained from the Griffith lab RNA sequencing tutorial and renamed for this course series.

Reference genome can be downloaded at https://rnabio.org/module-01-inputs/0001/02/01/Reference_Genomes/

Annotation can be downloaded at https://rnabio.org/module-01-inputs/0001/03/01/Annotations/

See https://rnabio.org/module-01-inputs/0001/05/01/RNAseq_Data/ for instructions on downloading the HBR-UHR and hcc1395 data.


Contact Us

Email ncibtep@nih.gov if you have any comments, questions, or concerns.