Lesson 9 Practice
Since the HBR-UHR data did not need trimming, this practice session will have participants align this data to the chromosome 22 human reference.
Sign onto Biowulf and change into the /data/user/hbr_uhr_b4b
folder.
Request an interactive session with 12 gb RAM (or memory) and 10 gb of local temporary storage.
Create a new folder in /data/user/hbr_uhr_b4b
called hbr_uhr_hisat2
.
Load HISAT2.
Stay in the folder hbr_uhr_b4b
for this exercise.
Build HISAT2 indices for the chromosome 22 reference genome. The file for the chromosome 22 genome is located in the folder references
and the file name is 22.fa
. Give the indices a base name (ie. file name without the extension) of 22
. List the contents of the references
folder to make sure that the build succeeded (ie. the .ht2
files are present).
After building the indices, align the HBR-UHR FASTQ files to genome. Use the parallel
command for this and store the alignment output in the folder hbr_uhr_hisat2
.
Solution
Look at the overall alignment rates in the HISAT2 alignment summary files. Why are they low?
Solution
"In addition, a spike-in control was used. Specifically we added an aliquot of the ERCC ExFold RNA Spike-In Control Mixes to each sample." -- (Griffith lab RNA Bio, https://rnabio.org/module-01-inputs/0001/05/01/RNAseq_Data/) This dataset has ERCC spike-ins as a quality control measure so not all sequences will map to the human chromosome 22 genome.Create sorted BAM files and BAM indices for the alignment results.