Mapping Sequences to Genome

After QC as well as quality and/or adapter trimming, it is time to map the sequences in the FASTQ files to the reference genome. RNA sequencing analysis requires the use of a splice aware aligner such as HISAT2 or STAR, which are both available on Partek Flow. In this class, HISAT2 will be used.

Note

The dataset, hcc1395, used for this class was subsetted to human chromosome 22. Thus, the sequences will be mapped to the hg38 chromosome 22 reference. See the Partek Flow documentations to learn how to add references and annotation files (ie. GTF files) to the user's Partek Flow account.

To map using HISAT2, click on the "Trimmed reads" data node and select "Aligners" in the menu. Then, click on "HISAT2". In the subsequent page, users will see an "Assembly" drop down box which will be used to select the index for the desired reference.

Note

HISAT2 indexes the reference prior to alignment in order to speed up the process.

If the reference index is not available in this drop down as in this example, scroll and select "New assembly". Keep the species as "Homo sapiens (human)" in the subsequent dialogue box labeled "Add HISAT2 index". Then select the reference file or assembly that needs to be indexed. In this case, "hg38_chromosome22". Keep the "Create option" as build as the HISAT2 needs index the reference prior to alignment. Click "Create" when ready. The "Index" drop down will populate with the newly built HISAT2 index when this step is done. Finally, click on "Finish" to start the alignment. HISAT2 will be run with default although users can configure it to meet their alignment stringency needs by clicking on "Configure" next to "Advanced options".