11. Advanced quality control with MultiQC copy

This page uses content directly from the Biostar Handbook by Istvan Albert.

Start by activating the bioinfo environment.

conda activate bioinfo

Create a new directory for the multiqc data.

mkdir multi
cd multi
Retrieve the data and decompress it.
curl http://data.biostarhandbook.com/data/sequencing-platform-data.tar.gz --output sequencing-platform-data.tar.gz
tar zxvf sequencing-platform-data.tar.gz
Now we will run FASTQC to compare the Illumina data to that of Iontorrent. This command will create two FASTQC reports, the --extract flag tells FASTQC to keep the data directories for each report.
fastqc --extract illumina.fq iontorrent.fq


Now we can use MultiQC to combine the data report directories.

Try installing the MultiQC program this way.

pip install multiqc
If you are having trouble with error messages while running MultiQC, it may help to create a new "qc" environment and run MultiQC from there.
conda create -n qc python=3.7
conda activate qc
conda install multiqc
You can test the install by asking to see the help documentation.
multiqc --help
Now we can try running MultiQc.
multiqc illumina_fastqc iontorrent_fastqc

Using the web browser, find the file "multiqc_report.html" and open it.

General Statistics

FASTQC Sequence Counts

Sequence Quality Histograms

Per Sequence Quality Scores

Per Base Sequence Content

Per Sequence GC Content

FASTQC: Per Base N Content

Sequence Length Distribution

Sequence Duplication Levels

Overrepresented sequences

Adapter Content

Status Checks