Skip to content

Lesson 14 Practice

In this practice session, participants will run the actually differential gene expression analysis on the HBR-UHR gene expression data.

Before getting started, be sure to be connected to Biowulf and have an interactive session with 12 gb of memory and 10 gb of local temporary storage space ready. Remember to replace user with the participant's assigned Biowulf student account ID.

Next, change into the /data/user/hbr_uhr_b4b folder.

Solution
cd /data/user/hbr_uhr_b4b

Load R.

Next, change into the /data/user/hbr_uhr_b4b folder.

Solution
module load R

Run the deg2.R script in the folder b4b_scripts to generate differential expression analysis results. Use hbr_uhr for the study name and write the output to hbr_uhr_deg.

Solution
Rscript b4b_scripts/deg.R hbr_uhr_deg/hbr_uhr_gene_expression_filtered.csv hbr_uhr_phenotypes.csv Treatment hbr_uhr hbr_uhr_deg

After running the differential expression analysis, copy the hbr_uhr_deg folder to local downloads folder to review the images.

To do this, open and a new terminal (mac) or command prompt window (Windows 10 or above) and change into the local Downloads folder.

cd Downloads

Then use the scp command construct below to download. Remember to replace user with the participants assigned Biowulf student ID.

scp -r user@helix.nih.gov:/data/user/hbr_uhr_b4b/hbr_uhr_deg .

How many genes matched the criteria of log2FoldChange>=4.5 and adjusted p-value <=0.01 or log2FoldChange<=-4.5 and adjusted p-value <=0.01?

Solution
wc -l hbr_uhr_deg/hbr_uhr_filter_deg.csv
There are 23 (ie. 24 lines in hbr_uhr_filter_deg.csv, so minus 1 line for the column headers to get 23 genes) genes that matched the specified log2FoldChange and adjust p-value threshhold.

What is the PCA obtained from normalized expression table suggesting?

Solution After normalization, the HBR and UHR samples still separate along the first principal components axis. But it appears that there is one sample from each group that does not cluster with the rest on the second principal components axis. This could suggest biological differences between samples within condition or batch effect.

Did normalization help in terms of pushing the gene expression distribution for each sample towards the same.

Solution
Solution

Take a look at the rest of the visualizations (ie. the distance plot, expression heatmap, and volcano plot) to see what can be concluded.