Skip to content

Lesson 7 Practice

This session will allow users to practice assessing FASTQ quality using data from the HBR-UHR study (see https://rnabio.org/module-01-inputs/0001/05/01/RNAseq_Data/).

First step is to make sure that the participant is signed onto Biowulf. If not then do the following to sign on. Remember to replace user with the participant's own Biowulf sign on ID.

Solution
ssh user@biowulf.nih.gov

Then change into the data directory.

Solution
cd /data/user

The practice data is stored in the folder hbr_uhr_b4b in the folder /data/classes/BTEP. Copy it to the data directory.

Solution
cp -r /data/classes/BTEP/hbr_uhr_b4b .

Change into the hbr_uhr_b4b folder in the participant's data directory.

cd hbr_uhr_b4b

Request an interactive session with 12 gb of RAM or memory and 10 gb of local temporary storage space.

Solution
sinteractive --mem=12gb --gres=lscratch:10

What are the contents in hbr_uhr_b4b?

Solution
ls -l
drwxr-x---. 2 wuz8 wuz8 4096 Dec 20 15:41 reads
drwxr-x---. 2 wuz8 wuz8 4096 Dec 20 14:37 references
There are two folders, `reads` and `references`.
ls -1 reads
The `reads` folder contains the FASTQ files. There are 12 of these, two for each sample.
HBR_Rep1_R1.fq
HBR_Rep1_R2.fq
HBR_Rep2_R1.fq
HBR_Rep2_R2.fq
HBR_Rep3_R1.fq
HBR_Rep3_R2.fq
UHR_Rep1_R1.fq
UHR_Rep1_R2.fq
UHR_Rep2_R1.fq
UHR_Rep2_R2.fq
UHR_Rep3_R1.fq
UHR_Rep3_R2.fq
ls references
The references folder contains the chromosome 22 reference (fa or FASTA) and annotation (gtf) files.
22.fa  22.gtf
There is also a file, `hbr_uhr_samples.txt` that contains the HBR-UHR sample IDs.
cat hbr_uhr_samples.txt
HBR_Rep1
HBR_Rep2
HBR_Rep3
UHR_Rep1
UHR_Rep2
UHR_Rep3

Make a directory called hbr_uhr_b4b_raw_qc inside /data/user/hbr_uhr_b4b.

Solution
mkdir hbr_uhr_b4b_raw_qc

Load FASTQC.

Solution
module load fastqc

Run FASTQC for raw FASTQ files and save the results in hbr_uhr_b4b_raw_qc. Stay in /data/user/hbr_uhr_b4b for this.

Solution
fastqc reads/*.fq -o hbr_uhr_b4b_raw_qc

Change into hbr_uhr_b4b_raw_qc.

Solution
cd hbr_uhr_b4b_raw_qc

Load and run MultiQC to combine all of the FASTQC reports in hbr_uhr_b4b_raw_qc. Name the MultiQC results with prefix hbr_uhr_b4b_raw_qc.

Solution
module load multiqc
multiqc --filename hbr_uhr_b4b_raw_qc .

Copy the hbr_uhr_b4b_raw_qc.html MultiQC report to local Downloads to view the report.

Solution Open a new Terminal (Mac) or Command Prompt (Windows). Then change into the local `Downloads` folder.
cd Downloads
scp user@helix.nih.gov:/data/user/hbr_uhr_b4b/hbr_uhr_b4b_raw_qc/hbr_uhr_b4b_raw_qc.html .

Based on the MultiQC report for the raw FASTQ files, what is the next step.

Solution Align the data to reference genome since the quality of the data is good and there does not appear to be adapter contamination.