Lesson 11 Practice

Objectives

In this lesson, we learned to

merge multiple FASTQC reports into one
perform data cleanup (quality and adapter trimming) to prepare our sequencing reads for downstream analysis. Here, we will put what we learned to practice.

Merging Golden Snidget FASTQC reports into one

Before getting started, we should change into the ~/biostar_class/snidget/QC directory where the Golden Snidget sequencing data and FASTQC reports were saved. How do we do this?

Solution

cd ~/biostar_class/snidget/QC

How do we merge the FASTQC results from the Golden Snidget dataset into one?

Solution

Since we are in the snidget folder, which contains our FASTQC results, we can use "." to denote "here in this folder" because MultiQC will look for output logs in the specify folder.

multiqc --filename multiqc_report_snidget .

Next copy the MultiQC output to the public directory. Do you remember how to do this?

Solution

cp multiqc_report_snidget.html ~/public/multiqc_report_snidget.html

Can you configure the Golden Snidget MultiQC output's General Statistics table to show the percentage of modules that failed?

In the General Statistics table of the Golden Snidget MultiQC report, can you assign different colors to distinguish the FASTQ files for the BORED and EXCITED groups?

In the overrepresented sequences plot, how many samples have warnings and how many failed?

Solutions

Golden Snidget FASTQC files MultiQC results.

Quality and adapter trimming

Let's go back to the biostar_class directory and create a folder called practice_trimming for this exercise. How do we do this?

Solution

This depends on where you are currently (ie. your present working directory is). From there go back to the biostar_class folder.

cd ~/biostar_class

mkdir practice_trimming

After the "practice_trimming" directory has been created, change into this directory. How do we do this?

Solution

cd practice_trimming

Next, download a FASTQ file from NCBI/SRA to practice trimming with.

fastq-dump --split-files -X 10000 SRR1553606

Once the download is complete the message below will appear.

Read 10000 spots for SRR1553606
Written 10000 spots for SRR1553606

How many FASTQ files were downloaded? And from the file names, is this from paired or single end sequencing.

Solution

ls

Two FASTQ files were downloaded and this is paired end sequencing.

Let's run FASTQC for the these files. Do you recall how to do this?

Solution

fastqc SRR1553606_*.fastq

Copy the FASTQC outputs (html files) to the public directory.

Solution

cp SRR1553606_*_fastqc.html ~/public

How is the quality and are there adapter contamination for the FASTQ files in SRR1553606? If yes, can we trim away the adapters and poor quality reads? FYI, for this exercise our adapter sequence is below (can we create an input file called nextera_adapter.fa with the adapter sequence?).

>nextera
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC

Solution

The answer is the quality for both FASTQ files is not great and we can remove the poor quality reads and the adapters.

nano nextera_adapter.fa

Copy and paste the adapter sequence into nano, hit control x and save to exit.

>nextera
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC

trimmomatic PE SRR1553606_1.fastq SRR1553606_2.fastq SRR1553606_trimmed_1.fastq SRR1553606_trimmed_1_unpaired.fastq SRR1553606_trimmed_2.fastq SRR1553606_trimmed_2_unpaired.fastq SLIDINGWINDOW:4:30 ILLUMINACLIP:nextera_adapter.fa:2:30:5 MINLEN:50

Run FASTQC on the trimmed output. Any improvements?

Solution

fastqc SRR1553606_trimmed_1.fastq SRR1553606_trimmed_2.fastq

What is another tool that we can use to perform quality and adapter trimming on FASTQ files?

Solution

BBDuk

bbduk.sh in=SRR1553606_1.fastq in2=SRR1553606_2.fastq out=SRR1553606_bbduk_1.fastq out2=SRR1553606_bbduk_2.fastq qtrim=r overwrite=true trimq=30 ref=nextera_adapter.fa ktrim=r k=16 mink=10 hdist=1 tpe minlen=50

MultiQC report for Golden Snidget

Golden Snidget MultiQC report