Skip to content

Quantitation

Counting as a measure of Expression

Most RNASEQ techniques deal with count data.

  • The reads are mapped to a reference and the number of reads mapped to each gene/transcript is counted
  • Read counts are roughly proportional to gene-length and abundance
  • The more reads the better
    • Artifacts occur because of:
      • Sequencing Bias
      • Positional bias along the length of the gene Gene annotations (overlapping genes) Alternate splicing
      • Non-unique genes
      • Mapping errors

The typical steps in quantitation of mapped reads is as follows:

  • Count mapped reads
  • Count each read once (deduplicate)
  • Discard reads that:
    • have poor quality alignment scores
    • are not uniquely mapped
    • overlap several genes
    • Have paired reads do not map together
  • Remember to document what was done

Count Normalization

There are three metrics commonly used to attempt to normalize for sequencing depth and gene length.

norm

Counting as a measure of Expression

An example of a counts matrix for RNASEQ data. counting

Log Transformed Data

Because of its vast dynamic ranges RNASEQ data is typically log transformed in order to: provide better visualizations and to present analysis software with a more "normal distribution". logtrans

Common counting Programs

  • Subread (featureCount)
  • STAR (quantmode)
  • HTseq (counts)
  • RSEM (RNA-Seq by Expectation Maximization) Salmon, Kallisto - pseudoaligners
  • Salmon (pseudo aligner and counter)