Lesson 14: Visualizing Genomic Data: Part 2

Lesson 13 Review

Participants learned how to prepare files (ie. bigWig) for visualizing genomic in Lesson 13.

Learning Objectives

After this lesson, participants will:

Have a high level understanding of the Integrative Genome Browser, used for visualizing genomic.
Know the difference between visualizing genomic information from a bigWig file and BAM file.
Understand the difference in output obtained from a splice aware versus a non-splice aware sequence aligner.

Launch IGV From HPC OnDemand

Launch the IGV session on NIH HPC OnDemand by clicking on "Interactive Apps" and then choosing "IGV".

In the subsequent page, users will be able to select compute resources for the IGV session. Starting an application on HPC OnDemand will consume one of the two interactive sessions. Click on the "Launch" button when ready.

Once the IGV session's compute resources have been allocated, click on "Launch IGV" to get started.

Select "Human (hg38)" as the reference in the genome selection drop down menu.

A track showing the genes for hg38 will appear. Right click on this track to see the configuration options, which include:

Track color and font size.
How densely the data should be displayed on the track (ie. collapsed, expanded, or squished).

Viewing Coverage with bigWig Files

To load genomic data tracks, select "File" in the IGV menu bar. User can load from file, URL, or server. In this case, "Load from File" will be used to select alignment bigWig files from the /data/user/hcc1395_b4b/hcc1395_hisat2 folder.

The bigWig files show pre-computed alignment coverage and it is clear that the only location where sequences have aligned is chromosome 22 as indicated by the peaks. Either click on the "22" above the peak or selected from the chromosome selection drop down menu.

After filtering to only chromosome 22, the coverage data along with the genes on this chromosome are apparent.

Next, right click on the track labeled "normal_rep1.bw" and change the color to help distinguish is from the tumor sample (ie. tumor_rep1.bw).

Change the "normal_rep1.bw" track to red.

Next, select both the normal and tumor bigWig tracks, right click and select "none" for the Windowing function and Group auto scale to put the data ranges shown on the tracks on the same scale.

Note

Regarding the Windowing function: "When the view is zoomed out, each pixel on the screen may represent a genomic region that encompasses multiple numeric values in the data. The windowing function specifies which of the multiple values to display. To set the function, select one of the options in the Windowing Function section of the track right-click pop-up menu. The available options will depend on the file type, but most include: Minimum, Mean, Maximum, and None. By default, the function is set to Mean. The None option will display all the values, rather than combining them into one value, which can be useful for tracks displayed as points". -- IGV

Regarding group autoscale: "When comparing several sets of tracks, it is helpful to scale them on the same axis using the “Group Autoscale” option." -- https://eclipsebio.com/eblogs/how-to-use-igv-1/

Search for the gene TOB2. From this IGV view, it appears that TOB2 is expressed higher in the "tumor_rep1" as compared to the "normal_rep1" sample due to more reads aligning to TOB2 in "tumor_rep1". In the expanded view of the gene track, transcript isoforms are shown. From the IGV image below, which one of the TOB2 transcripts is expressed?

Viewing BAM files in IGV

For this exercise, click on the chromosome selection drop down and choose "All". Then load normal_rep1.bam and tumor_rep1.bam to the tracks. Unlike the bigWig files, which shows pre-calculated coverage in IGV as soon as they are loaded, BAM files requires users to zoom in to a specific location in order to see the coverage information.

Upon zooming into TOB2, users will see that the BAM files contains more information than the bigWig file. These include:

Coverage information (also shown in the bigWigs).
The actual alignments.
Splice junctions (note that the parts of sequencing reads that span across exons are connected by solid lines in the alignment track).

Zoom in a bit and a potential single nucleotide variant is apparent. Where the sequence contains a T, the reference contains a C. Users can also view insertions/deletions in IGV when looking at BAM files.

GRK3 is expressed higher in the normal_rep1 sample than tumor_rep1 sample.

Windowing fuction: none
Group auto scale: on