Skip to content

Post-alignment QC

After mapping, the next step is to perform post-alignment QC to determine things like overall alignment rate (ie. how many sequences aligned to the reference). To do this, select the "Aligned" reads data node and then select "QA/QC" from the menu. From there, click "Post-alignment QA/QC". After QC completes, click on the "Post-alignment QA/QC" task node to view results.

The first item in the "Post-alignment QA/QC" report is an alignment statistics table and an explanation of the columns is provided below.

  • Total reads: This reports the number of reads or sequences in a sample. For paired end sequencing, this refers to the number of read pairs.
  • Total alignment: This column indicates the number of times reads in a sample mapped to the genome. Do not confuse this with the number or percent of reads that mapped.
  • Aligned: The percent of reads that mapped to the genome is provided here. The next four columns indicate the percentage of reads that where mapped uniquely (ie. to one location on the genome) or non-uniquely (ie. multimappers) and whether both reads in the pair or only one in the pair mapped (singleton). The value under the Aligned column is the sum of the Unique-singleton, Unique paired, Non-unique paired, Non-unique singleton columns.
  • Coverage: This columns informs of the percentage or amount of bases in the genome that the reads in a sample cover.
  • Avg. coverage depth: The average number of alignments in the region(s) of the genome covered by sequencing reads.
  • Avg. length: The values in the column correspond to the average length (ie. number of bases) for the mapped reads in a sample.
  • Avg. quality: This column reports the average quality of the mapped reads in a sample.
  • %GC: The GC percentage of the mapped reads in a sample is given here.

All samples have greater than 97% alignment rate.

The information shown in the above table are also presented as plots and among these is a stacked bar chart showing the percentage breakdown of alignment types discussed below.

  • Unique paired occurs when both reads in paired end sequencing align to only one genomic region.
  • Non-unique paired happens when both reads in paired end sequencing align to more than one genomic region. These are consider multi-mappers.
  • Unique singleton refers to only one read in paired end sequencing align to one genomic region.
  • Non-unique singleton means that only one read in paired end sequencing aligned but to multiple genomic regions.

Note

The ideal circumstance is that our reads align uniquely as this will not cause ambiguity in terms of determining which read goes to which gene or transcript when generating expression matrix.

The next visualization provides the number of reads in a sample. Again, for paired end sequencing, this refers to the number of read pairs.

Also provided are bar charts of the average sequencing depth and the genomic coverage for each sample.

The average quality of each base and the quality distribution for the samples for all reads that aligned are also available as plots.

The next plot shows the alignments per read. Each sample has 1.99 alignments per read (close to two) because Partek Flow counts two alignments when both reads in pair map to the genome. Also, the alignments per read number in this dataset is not exactly two due to situations such as one read of the pair mapping.

Click on the individual samples to view its sample-level post-alignment QC results. The first plot shows the alignments per read for a sample. Most reads have two alignments in this particular example, which is ideal for paired end sequencing.

Next, a there is a pie chart that illustrates the portion of reads in a sample that aligned or unaligned.

In paired end sequencing, the number of bases that span the 5' end of one read and the 5' end of another is known as the outer distance. This should be approximately equal to the nucleotide fragment length used in library preparation. Deviation of outer distance from expected could indicate the presences of structural variants such as insertions or deletions. Also, because the read length in this dataset is 151 bases, it will be expected that the two reads in the pair will overlap when aligned given the selected range for fragment lengths.

Tip

Read the article at https://www.cureffi.org/2012/12/19/forward-and-reverse-reads-in-paired-end-sequencing/ to get a basic idea of paired end sequencing.

The base composition and read quality scores are also available in the sample level post-alignment QC report.

The confidence that a read was aligned or mapped to the correct location on the genome is another important post-alignment QC metric and is indicated by the mapping quality. The probability that a read was aligned incorrectly to a location on the genome can be estimated from the mapping quality through the equation below. Most reads in this dataset have a mapping quality of 60, which corresponds to 0.0001% error.

Finally, the length distribution for the aligned reads is also provided in the sample-level post-alignment QC report.