Skip to content

Lesson 12: RNA sequencing review 1

Learning objectives

Here, we will do a quick review of what we have learned about RNA sequencing in Lessons 8 through 11.

Accessing the Biostar handbook

The URL for the Biostar handbook is https://www.biostarhandbook.com.

Once you sign into this handbook, you will find that it is composed of several different books including one for RNA sequencing.

Scroll to the bottom of the page and you will find a button that says Access Your Account. Click this to sign in.



Because the Biostars handbook subscription is only good for 6 months, we recommend that you download either the PDF or eBook.


Review of RNA sequencing concepts

  • Purpose of RNA sequencing and what biological questions can RNA sequencing answer
  • Experimental considerations
    • Sample preparation
      • Replicates
      • Technical noise
    • Read depth
      • More depth for low expression genes
      • More depth for low expression differences between samples
    • RNA quality

RNA sequencing analysis considerations

What are the files that we need for RNA sequencing analysis?

Solution

  • Reference genome or transcriptome
  • Annotation files (gff or gtf) that tells us the genomic features (ie. gene, transcript, etc.)
  • Raw sequencing data in FASTQ (or fq) format

Review of reference genome and annotation files

Why do we need a reference genome?

Solution

The reference genome serves as a "known" that guides us in constructing the genome of the unknown from sequencing data.

What file format is the reference genome in and what information does it contain?

Solution

The reference genome is in the fasta/fa format. These files will have extension fasta or fa, where the two extensions are used interchangeably.

A fasta file contains a definition line that starts with ">" followed by nucleotide sequences.

What is the annotation file used for?

Solution

The annotation file lists the features of a genome (ie. genes, transcripts, exons) along with their coordinates and other information. Annotations files are useful in RNA sequencing because it informs us of which gene or transcripts the aligned reads are overlapping and thus helps us generate a table of expression counts for our samples either on per gene or transcript basis.

Review of FASTQ files

What is a FASTQ file?

Solution

A fastq or fq file is the format for files that contain our sequencing data. Similar to a fasta file, which contains a header line that starts with ">" followed by sequence, the fastq file also contains a header line for each sequencing read that starts with "@". The sequencing read follows the metadata line, which is then followed by a "+" sign and a line that contains the quality score of each of the bases in a sequencing read.

What tool can we use to assess quality of sequencing data? And how do we aggregate several FASTQC reports into one.

Solution

FASTQC

To aggregate multiple FASTQC reports, we can use MultiQC

What type of data clean up can we perform on sequencing data prior to downstream analysis?

Solution

We can trim away adapters and low quality reads. Trimmomatic is a tool that can be used to do this.