Skip to content

Lesson 8 Practice

Participants will practice trimming using trimmomatic in this help session but with data downloaded from the SRA instead as the HBR-UHR FASTQC report indicated that there are no adapter contamination.

Before getting started, be sure to sign onto Biowulf.

Solution
ssh user@biowulf.nih.gov

Then change into the participant's data directory.

Solution
cd /data/user

Create a new folder called practice_trim and change into it.

Solution
mkdir practice_trim
cd practice_trim

Request an interactive session with 12 gb of RAM (or memory) and 10 gb of local temporary storage.

Solution
sinteractive --mem=12gb --gres=lscratch:10

Use the SRA Tool kit to obtain the first 10,000 sequences for SRR1553606. These are paired end sequences.

Solution
module load sratoolkit
fastq-dump --split-files -X 10000 SRR1553606
Read 10000 spots for SRR1553606
Written 10000 spots for SRR1553606
Stay in `practice_trim` and list its contents.
ls
SRR1553606_1.fastq  SRR1553606_2.fastq

QC the two FASTQ files for SRR1553606 (ie. SRR1553606_1.fastq and SRR1553606_2.fastq).

Solution
module load fastqc
fastqc *.fastq

Copy the HTML FASTQC report to local computer Downloads folder to view the results. Is there adapter contamination?

Solution Change into local `Downloads` and obtain the HTML FASTQC reports using `scp`.
scp user@helix.nih.gov:/data/user/practice_trim/SRR1553606_1_fastqc.html  .
scp user@helix.nih.gov:/data/user/practice_trim/SRR1553606_2_fastqc.html  .
There appears to be Nextera Transposase contamination in this data. Would this data require quality trimming?

Try trimming away the adapter sequence below from the FASTQ files and rerun FASTQC.

>nextera
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC

Hint: copy the above sequence and save it to the file nextera.fa.

Solution
nano nextera.fa
Copy and paste the Nextera sequence above, hit control-x, save and exit the editor.

Trim away the sequence in nextera.fa using Trimmomatic.

  • For quality trimming, use a SLIDINGWINDOW of size 4 bases and average score in the window of 30.
  • For adapter trimming, set the adapter strigencies
  • seed mismatches: 2
  • palindrome clip threshold: 30
  • simple clip threshold: 5
  • Retain on those trimmed sequences whose length is greater than 50 bases
Solution
module load trimmomatic
java -jar $TRIMMOJAR PE SRR1553606_1.fastq SRR1553606_2.fastq SRR1553606_trimmed_1.fastq SRR1553606_trimmed_1_unpaired.fastq SRR1553606_trimmed_2.fastq SRR1553606_trimmed_2_unpaired.fastq SLIDINGWINDOW:4:30 ILLUMINACLIP:nextera.fa:2:30:5 MINLEN:50

QC the two paired and trimmed FASTQ files.

Solution
fastqc *trimmed_*.fastq

Copy the two HTML reports for the trimmed FASTQ files to local Downloads folder to review the results.

Solution
scp user@helix.nih.gov:/data/user/practice_trim/SRR1553606_trimmed_1_fastqc.html .
scp user@helix.nih.gov:/data/user/practice_trim/SRR1553606_trimmed_2_fastqc.html .

Did trimming remove the adapters and improve the data quality?

Solution Yes