Lesson 8 Practice
Participants will practice trimming using trimmomatic
in this help session but with data downloaded from the SRA instead as the HBR-UHR FASTQC report indicated that there are no adapter contamination.
Before getting started, be sure to sign onto Biowulf.
Then change into the participant's data
directory.
Create a new folder called practice_trim
and change into it.
Request an interactive session with 12 gb of RAM (or memory) and 10 gb of local temporary storage.
Use the SRA Tool kit to obtain the first 10,000 sequences for SRR1553606. These are paired end sequences.
Solution
Stay in `practice_trim` and list its contents.QC the two FASTQ files for SRR1553606 (ie. SRR1553606_1.fastq
and SRR1553606_2.fastq
).
Copy the HTML FASTQC report to local computer Downloads
folder to view the results. Is there adapter contamination?
Solution
Change into local `Downloads` and obtain the HTML FASTQC reports using `scp`. There appears to be Nextera Transposase contamination in this data. Would this data require quality trimming?Try trimming away the adapter sequence below from the FASTQ files and rerun FASTQC.
Hint: copy the above sequence and save it to the file nextera.fa
.
Solution
Copy and paste the Nextera sequence above, hit control-x, save and exit the editor.Trim away the sequence in nextera.fa
using Trimmomatic.
- For quality trimming, use a SLIDINGWINDOW of size 4 bases and average score in the window of 30.
- For adapter trimming, set the adapter strigencies
- seed mismatches: 2
- palindrome clip threshold: 30
- simple clip threshold: 5
- Retain on those trimmed sequences whose length is greater than 50 bases
Solution
QC the two paired and trimmed FASTQ files.
Copy the two HTML reports for the trimmed FASTQ files to local Downloads
folder to review the results.
Solution
Did trimming remove the adapters and improve the data quality?