Lesson 12: RNA sequencing review 1
Learning objectives
Here, we will do a quick review of what we have learned about RNA sequencing in Lessons 8 through 11.
Accessing the Biostar handbook
The URL for the Biostar handbook is https://www.biostarhandbook.com.
Once you sign into this handbook, you will find that it is composed of several different books including one for RNA sequencing.
Scroll to the bottom of the page and you will find a button that says Access Your Account. Click this to sign in.
Because the Biostars handbook subscription is only good for 6 months, we recommend that you download either the PDF or eBook.
Review of RNA sequencing concepts
- Purpose of RNA sequencing and what biological questions can RNA sequencing answer
- Experimental considerations
- Sample preparation
- Replicates
- Technical noise
- Read depth
- More depth for low expression genes
- More depth for low expression differences between samples
- RNA quality
- Sample preparation
RNA sequencing analysis considerations
What are the files that we need for RNA sequencing analysis?
Solution
- Reference genome or transcriptome
- Annotation files (gff or gtf) that tells us the genomic features (ie. gene, transcript, etc.)
- Raw sequencing data in FASTQ (or fq) format
Review of reference genome and annotation files
Why do we need a reference genome?
Solution
The reference genome serves as a "known" that guides us in constructing the genome of the unknown from sequencing data.
What file format is the reference genome in and what information does it contain?
Solution
The reference genome is in the fasta/fa format. These files will have extension fasta or fa, where the two extensions are used interchangeably.
A fasta file contains a definition line that starts with ">" followed by nucleotide sequences.
What is the annotation file used for?
Solution
The annotation file lists the features of a genome (ie. genes, transcripts, exons) along with their coordinates and other information. Annotations files are useful in RNA sequencing because it informs us of which gene or transcripts the aligned reads are overlapping and thus helps us generate a table of expression counts for our samples either on per gene or transcript basis.
Review of FASTQ files
What is a FASTQ file?
Solution
A fastq or fq file is the format for files that contain our sequencing data. Similar to a fasta file, which contains a header line that starts with ">" followed by sequence, the fastq file also contains a header line for each sequencing read that starts with "@". The sequencing read follows the metadata line, which is then followed by a "+" sign and a line that contains the quality score of each of the bases in a sequencing read.
What tool can we use to assess quality of sequencing data? And how do we aggregate several FASTQC reports into one.
Solution
FASTQC
To aggregate multiple FASTQC reports, we can use MultiQC
What type of data clean up can we perform on sequencing data prior to downstream analysis?
Solution
We can trim away adapters and low quality reads. Trimmomatic is a tool that can be used to do this.