Lesson 9 Practice

Since the HBR-UHR data did not need trimming, this practice session will have participants align this data to the chromosome 22 human reference.

Sign onto Biowulf and change into the /data/user/hbr_uhr_b4b folder.

Solution

ssh user@biowulf.nih.gov

cd /data/user/hbr_uhr_b4b

Request an interactive session with 12 gb RAM (or memory) and 10 gb of local temporary storage.

Solution

sinteractive --mem=12gb --gres=lscratch:10

Create a new folder in /data/user/hbr_uhr_b4b called hbr_uhr_hisat2.

Solution

mkdir hbr_uhr_hisat2

Load HISAT2.

Solution

module load hisat2

Stay in the folder hbr_uhr_b4b for this exercise.

Build HISAT2 indices for the chromosome 22 reference genome. The file for the chromosome 22 genome is located in the folder references and the file name is 22.fa. Give the indices a base name (ie. file name without the extension) of 22. List the contents of the references folder to make sure that the build succeeded (ie. the .ht2 files are present).

Solution

hisat2-build references/22.fa references/22

After building the indices, align the HBR-UHR FASTQ files to genome. Use the parallel command for this and store the alignment output in the folder hbr_uhr_hisat2.

Solution

cat hbr_uhr_samples.txt | parallel "hisat2 -x references/22 -1 reads/{}_R1.fq -2 reads/{}_R2.fq -S hbr_uhr_hisat2/{}.sam --summary-file hbr_uhr_hisat2/{}_summary.txt"

Look at the overall alignment rates in the HISAT2 alignment summary files. Why are they low?

Solution

cat hbr_uhr_hisat2/*summary.txt | grep "overall"

52.75% overall alignment rate
52.51% overall alignment rate
52.68% overall alignment rate
47.91% overall alignment rate
51.26% overall alignment rate
46.94% overall alignment rate

"In addition, a spike-in control was used. Specifically we added an aliquot of the ERCC ExFold RNA Spike-In Control Mixes to each sample." -- (Griffith lab RNA Bio, https://rnabio.org/module-01-inputs/0001/05/01/RNAseq_Data/) This dataset has ERCC spike-ins as a quality control measure so not all sequences will map to the human chromosome 22 genome.

Create sorted BAM files and BAM indices for the alignment results.

Solution

Create BAM from SAM.

cat hbr_uhr_samples.txt | parallel "samtools sort -o hbr_uhr_hisat2/{}.bam hbr_uhr_hisat2/{}.sam"

Index BAM.

cat hbr_uhr_samples.txt | parallel "samtools index -b hbr_uhr_hisat2/{}.bam hbr_uhr_hisat2/{}.bam.bai"