09. Quality trimming of sequence data copy
This page uses content directly from the Biostar Handbook by Istvan Albert.
Learn: using trimmomatic to remove low-quality bases from a sequence
Always remember to activate the bioinformatics environment.
conda activate bioinfo
We will be downloading some new sequence data. Create a new directory within your sequences directory and name it "trimming".
mkdir trimming
cd trimming
Download fastq sequences from NCBI/SRA.
fastq-dump --split-files -X 10000 SRR1553607
2020-05-15T18:05:34 srapath.2.10.1 int: buffer insufficient while reading uri within cloud module - cannot Get Cloud Location
Read 10000 spots for SRR1553607
Written 10000 spots for SRR1553607
trimmomatic SE SRR1553607_2.fastq trimmed_2.fq SLIDINGWINDOW:4:30
SLIDINGWINDOW: Performs a sliding window trimming approach. It starts scanning at the 5‟ end and clips the read once the average quality within the window falls below a threshold.
SLIDINGWINDOW:
Prints the following to the screen.
TrimmomaticSE: Started with arguments:
SRR1553607_2.fastq trimmed_2.fq SLIDINGWINDOW:4:30
Automatically using 4 threads
Quality encoding detected as phred33
Input Reads: 10000 Surviving: 9277 (92.77%) Dropped: 723 (7.23%)
TrimmomaticSE: Completed successfully
fastqc SRR1553607_2.fastq trimmed_2.fastq
Started analysis of SRR1553607_2.fastq
Approx 10% complete for SRR1553607_2.fastq
Approx 20% complete for SRR1553607_2.fastq
Approx 30% complete for SRR1553607_2.fastq
Approx 40% complete for SRR1553607_2.fastq
Approx 50% complete for SRR1553607_2.fastq
Approx 60% complete for SRR1553607_2.fastq
Approx 70% complete for SRR1553607_2.fastq
Approx 80% complete for SRR1553607_2.fastq
Approx 90% complete for SRR1553607_2.fastq
Approx 100% complete for SRR1553607_2.fastq
Analysis complete for SRR1553607_2.fastq
Started analysis of trimmed_2.fq
Approx 10% complete for trimmed_2.fq
Approx 20% complete for trimmed_2.fq
Approx 30% complete for trimmed_2.fq
Approx 40% complete for trimmed_2.fq
Approx 50% complete for trimmed_2.fq
Approx 65% complete for trimmed_2.fq
Approx 75% complete for trimmed_2.fq
Approx 85% complete for trimmed_2.fq
Approx 95% complete for trimmed_2.fq
Analysis complete for trimmed_2.fq
SRR1553607_2.fastq (Before trimming)
SRR1553607_2.fastq (After trimming)
Another program to trim sequences is BBDuk (Decontamination using Kmers):
bbduk.sh in=SRR1553607_2.fastq out=bbduk.fq qtrim=r overwrite=true trimq=30
qtrim=r (right end only)
overwrite=true (ow) Grant permission to overwrite files.
Trim read ends to remove bases with quality below trimq.
Values: * rl (trim both ends), * f (neither end), * r (right end only), * l (left end only), * w (sliding window).
Giving this output to the screen
/Users/stonelakeak/miniconda3/envs/bioinfo/opt/bbmap-38.79-0//calcmem.sh: line 75: [: -v: unary operator expected
Max memory cannot be determined. Attempting to use 1400 MB.
If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command,
or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value.
java -ea -Xmx1400m -Xms1400m -cp /Users/stonelakeak/miniconda3/envs/bioinfo/opt/bbmap-38.79-0/current/ jgi.BBDuk in=SRR1553607_2.fastq out=bbduk.fq qtrim=r overwrite=true qtrim=30
Executing jgi.BBDuk [in=SRR1553607_2.fastq, out=bbduk.fq, qtrim=r, overwrite=true, qtrim=30]
Version 38.79
0.033 seconds.
Initial:
Memory: max=1468m, total=1468m, free=1439m, used=29m
Input is being processed as unpaired
Started output streams: 0.043 seconds.
Processing time: 0.092 seconds.
Input: 10000 reads 1010000 bases.
QTrimmed: 4188 reads (41.88%) 181620 bases (17.98%)
Total Removed: 154 reads (1.54%) 181620 bases (17.98%)
Result: 9846 reads (98.46%) 828380 bases (82.02%)
Time: 0.136 seconds.
Reads Processed: 10000 73.37k reads/sec
Bases Processed: 1010k 7.41m bases/sec
Now we can run fastqc on the before and after BBDuk trimmed sequences.
fastqc SRR1553607_2.fastq bbduk.fq
SRR1553607_2.fastq -> bbduk.fq (trimmed with bbduk)
To trim paired-end sequences with trimmomatic:
trimmomatic PE SRR1553607_1.fastq SRR1553607_2.fastq trimmed_1.fq unpaired_1.fq trimmed_2.fq unpaired_2.fq SLIDINGWINDOW:4:30

Here's how to do the same with bbduk (see also: BBTools User Guide)
bbduk.sh in1=SRR1553607_1.fastq in2=SRR1553607_2.fastq outm1=bbduk_1.fq out1=unpaired_bb_1.fq outm2=bbduk_2.fq out2=unpaired_bb_2.fq qtrim=r overwrite=true qtrim=30