Lesson 15 Practice
Objectives
Previously, we performed QC on the Golden Snidget RNA sequencing data, aligned the sequencing reads to its genome, and obtained expression counts. We can now finally perform differential expression analysis, to find out which genes are differentially expressed between the EXCITED and BORED states of the Golden Snidget.
Scripts used for differential analysis
Recall that the scripts used for differential expression analysis are in the folder /usr/local/code. We can also access this folder using the environmental variable CODE. How many scripts are in this folder (find out by not using the full path to the folder containing the scripts).
Make sure we change into ~/biostar_class/snidget before starting.
Create a folder to store the Golden Snidget differential expression analysis results
First, create a folder to store the Golden Snidget differential expression analysis results. Name this folder snidget_deg.
Create the design file
Create the design.csv file using the nano editor. Recall that the design files contain nothing more than a column with sample names and a column informing of sample treatment condition. Some of the R helper scripts require a csv version of this, where the columns are separated by comma. Here, we create both before moving on.
Creating design.csv
Copy the text below to the nano editor, hit control-x and save to return to the terminal.
sample,condition
BORED_1.bam,BORED
BORED_2.bam,BORED
BORED_3.bam,BORED
EXCITED_1.bam,EXCITED
EXCITED_2.bam,EXCITED
EXCITED_3.bam,EXCITED
Obtaining the counts table
Change into ~/biostar_class/snidget/snidget_hisat2/ when running featureCounts to obtain the expression counts table. The annotation file for this dataset is in ~/biostar_class/snidget/refs and is named features.gff. How would we construct featureCounts to obtain an expression counts table for the Golden Snidget?
Solution
Format the Golden Snidget counts table for differential expression analysis
Remember that the deseq2.r script requires that the expression counts table be in csv format. It is currently in tab delimited format as generated by featureCounts. There is also a header line that we need to get rid of in the counts table. Finally, recall that our expression counts table is stored as counts.txt in the ~/biostar_class/snidget/snidget_deg directory, so change into this before moving forward.
Do you remember how to remove the header line in the counts table?
Solution
Save the counts table without header, we will need it later.Next, how do we remove columns 2 through 6 of the counts table and convert it from tab delimited to csv?
Solution
Again, save the counts table without header, we will need it later.Now that the correctly formated counts table is generated. Let's see if we can remember how to run deseq2.r to generate the differential expression results.
Take a look at the results.csv file, which contains the differential expression analysis output. Can we sorted by largest to smallest fold change? In the sorted results table, what do you notice? How well do the fold change results match expected?
Visualizing expression
Let's create an expression heatmap. How do we do this? Looking at the heatmap, do the treatments (ie. BORED and EXCITED) cluster well together?