Lesson 13 Practice
In this practice session, participants will filter and perform some quality checks on the HBR-UHR gene expression data.
Before getting started, make sure to be connected to Biowulf and an interactive session with 12 gb of memory and 10 gb of local temporary storage is created.
Change into the /data/user/hbr_uhr_b4b
folder.
Create directory called hbr_uhr_deg
to store the differential expression analysis outputs.
What is folder is the HBR-UHR gene expression data table stored and what is the file name?
Solution
The gene expression table is stored in the folder `hbr_uhr_expression/`. The name of the gene expression table is `hbr_uhr_gene_expression.csv`. To reference this from `hbr_uhr_b4b` use `hbr_uhr_expression/hbr_uhr_gene_expression.csv`.Load R.
The scripts are the in the folder b4b_script
Filter low expressing genes out of hbr_uhr_gene_expression.csv
. Set the minimum number of samples per group that have greater than 0 expression to be 2. Assign hbr_uhr to the study name and write the output to hbr_uhr_deg.
Solution
How many genes are in the filtered expression table?
Solution
Since the filter gene expression CSV file has 562 lines then this means 561 genes are left after filtering.Run QC on the filtered expression data. Assign hbr_uhr as the study name and write the output to the folder hbr_uhr_deg
.
Solution
After running QC, download the images to the Downloads
folder of personal computer to view the results. To do this, open and a new terminal (mac) or command prompt window (Windows 10 or above) and change into the local Downloads
folder.
Then use the scp
command construct below to download. Remember to replace user
with the participants assigned Biowulf student ID.
Use the Mac Finder or Windows Explorer to navigate to hbr_uhr_deg
in the local Downloads
directory to begin exploring the results.
Does it look like the samples are separated by biology?
Solution
Yes, it appears that biology is driving the difference between the HBR and UHR samples, which are separated along the first principal component axis.
What is the distance plot informing of?
Solution
The distance plot is not really optimal as samples within group are not as close together.
How does the expression distribution among samples look?
Solution
From the density and box plots, the expression distribution are not equal among samples. Hopefully normalization improves this.
