Bioinformatics for beginners
Module 2: Introduction to RNA sequencing
In this module, we will use the Human Brain Reference and Universal Human Reference RNA sequencing datasets to learn about RNA sequencing. Each lesson will be followed by a practice session where you can ask questions and practice what we have learned using the Golden Snidget dataset from the Biostars Handbook.
Week 4
Lesson 8: Introduction to experimental and analysis components of RNA sequencing
- Overview of wet lab procedures
- Library preparation process, including
- Adapters and indices
- Single and paired end sequencing
- Strandedness
- Coverage and depth
- Spike-ins
- Replicates
- Batch effects
- Overview of analysis procedures
- References and annotation files needed for analysis
- Methods for quality checking data
- Alignment of raw sequences to genome/transcriptome
- Quantifying expression
- Relevant statistics
- Normalization
- Tools available for differential expression analysis and extracting biological insights from data
- Preview of datasets that we will be working with to explore RNA sequencing analysis
- Golden snidget from Biostars
- UHR and UBR datasets (use for practice questions)
- Preview of upcoming RNA sequencing lessons
Week 5
Lesson 9: Understanding the data, reference genome, and annotation files
- Get to know the HBR and UHR dataset
- Introduce ourselves to the concept of reference genomes
- Provide a brief introduction to the Integrative Genome Viewer - will focus on visualizing our reference and annotation files
Lesson 10: FASTQ structure and assaying FASTQ quality
- Learn about the structure of FASTQ files
- Create a text file contains the base names of the HBR and UHR FASTQ files so that we can use those in the future (base names are file names without the extension)
- Using command line to retrieve some basic FASTQ file sequence statistics
- Using FASTQC, a prebuilt application, to generate quality metrics for FASTQ files
- Interpreting quality check results generated from FASTQC
Week 6
Lesson 11: Merge multiple FASTQC reports and trimming to remove sequencing adapters and quality
- Merge FASTQC reports using a tool called MULTIQC so that we can interrogate one report rather than multiple.
- Perform quality and adapter trimming of FASTQ files.
Lesson 12: Align raw sequences to reference genome, learn about alignment output and work with alignment output
- Learn to align the sequencing reads to reference
- Familiarize ourselves with the content of alignment output files (SAM)
- Learn to use SAMTOOLS to work with alignment output
Week 7
Lesson 13: Visualize alignments with IGV, post alignment QC, obtain expression counts
- Visualize alignment using IGV
- Post alignment QC
- Obtain expression counts
Lesson 14: Differential expression analysis, introduction to gene ontology and pathway analysis
- Differential expression and interpretation of results
- Brief introduction to gene ontology and pathway analysis
Week 8
Lesson 15: Introduction to classification based RNA sequencing review
- Introduction to classification base RNA sequencing
- Review of RNA sequencing