Skip to content

Lesson 9: Trimming

Lesson 8 Review

Lesson 8 introduced the FASTQ file, which is the format used to store NGS data. Participants also learned about assessing quality of the sequences in FASTQ files using FASTQC. This step is essential as it will inform whether sequencing is of high quality and if there are contamination in the sequences such as adapters. QC results revealed that while sequencing data quality is good, there are adapters in the sequences. Adapters will interfere with the mapping stage where algorithms are used to determine where in the genome the sequences came from. Thus, the next step is to trim away the adapters and perform QC again on the trimmed data to make sure the contamination has been removed.

Learning objectives

At the end of this session, participants will be able to use the tool Trimmomatic to trim away adapters from the hcc1395 sequencing data.

Sign onto Biowulf and Request an Interactive Session

Before getting, sign onto Biowulf and request an interactive session. In the ssh command below, replace user with the participant's Biowulf user id.

ssh user@biowulf.nih.gov

Next, request an interactive session with 12 gb of RAM and 10 gb of local temporary storage.

sinteractive --mem=12gb --gres=lscratch:10

Adapter Trimming with Trimmomatic

Load Trimmomatics

Trimmomatic is a tool that can perform both low quality sequence and adapter trimming. To use this program on Biowulf, do the following.

module load trimmomatic

Make a New Folder to Stored the Trimmed Reads

mkdir trimmed_reads

Run Trimmomatic

To run Trimmomatic, the parallel command will be introduced. This command enables analyst to run multiple tasks in parallel such as trimming of high throughput sequencing data. The command construct is broken down below.

  • cat is used to print the hcc1395 samples ids stored in the file hcc1395_sample_ids.txt. Rather than printing to terminal, | is used to send the output from cat to parallel.
  • The Trimmomatic construct is enclosed in double quotes within the parallel command. The components are as follows.
cat hcc1395_sample_ids.txt | parallel "java -jar $TRIMMOJAR PE -phred33 reads/{}_R1.fq reads/{}_R2.fq \
trimmed_reads/{}_trimmed_R1.fq trimmed_reads/{}_unpaired_R1.fq trimmed_reads/{}_trimmed_R2.fq trimmed_reads/{}_unpaired_R2.fq \
ILLUMINACLIP:references/illumina_multiplex.fa:2:30:5 MINLEN:25"
mkdir pre_alignment_qc_trimmed
module load fastqc
module load multiqc
fastqc trimmed_reads/*trimmed*.fq -o pre_alignment_qc_trimmed/
multiqc pre_alignment_qc_trimmed/ --filename hcc1395_multiqc
scp wuz8@helix.nih.gov:/data/wuz8/hcc1395_b4b/hcc1395_multiqc.html .