Skip to content

Lesson 10 Practice

In this session, participants will practice generating a gene expression matrix for the HBR-UHR data.

Sign onto Biowulf and change into the /data/user/hbr_uhr_b4b folder.

Solution
ssh user@biowulf.nih.gov
cd /data/user/hbr_uhr_b4b

Create a new folder called hbr_uhr_expression.

Solution
mkdir hbr_uhr_expression

Request an interactive session with 12 gb of RAM and 10 gb of local temporary storage space.

Solution
sinteractive --mem=12gb --gres=lscratch:10

Generate a gene expression table using featureCounts for all of the samples in one go. Change into /data/user/hbr_uhr_b4b/hbr_uhr_hisat2 for this.

Solution
module load subread
featureCounts -p --countReadPairs -a ../references/22.gtf -g gene_name -o ../hbr_uhr_expression/hbr_uhr_gene_expression.txt *.bam

Change back to the hbr_uhr_expression folder after the gene express matrix has been generated.

Solution
cd ../hbr_uhr_expression

Convert the gene expression matrix hbr_uhr_gene_expression.txt to a CSV file without a header line containing featureCounts information as well as with only the following columns:

  • Gene name
  • Columns for the expression of each sample

Do this using | (or pipe) to avoid writing intermediate files.

Solution
sed '1d' hbr_uhr_gene_expression.txt | cut -f1,7-12 | tr '\t' ',' > hbr_uhr_gene_expression.csv
Make sure the result is correct.
column -t -s ',' hbr_uhr_gene_expression.csv | less -S