Skip to content

Lesson 7: Help session

Lesson recap

This lesson has taught us how to download data from the web in Unix. We are also able to view file content and to search for patterns in files.

Practice questions

Question 1:

Make a folder in your data directory called lesson7_practice and change into it.

Solution

mkdir lesson7_practice
cd lesson7_practice

Question 2:

Next, goto the course data section of the class documentation. Download 22_transcriptome.fa into the lesson7_practice folder using wget. This file contains sequences corresponding to the transcripts found in human chromosome 22.

Solution

wget https://btep.ccr.cancer.gov/docs/unix-on-biowulf-2023/data/22_transcriptome.fa

Question 3:

Next, goto the course data section of the class documentation. Download 22.gtf into the lesson7_practice folder using curl. This is the genomic annotation file for human chromosome 22, which tells us where features such as genes, transcripts, exons, and coding sequences are found along a genome.

Solution

curl https://btep.ccr.cancer.gov/docs/unix-on-biowulf-2023/data/22.gtf -o 22.gtf

OR

curl -O https://btep.ccr.cancer.gov/docs/unix-on-biowulf-2023/data/22.gtf

Question 4:

Print the first six lines of 22_transcriptome.fa.

Solution

head -n 6 22_transcriptome.fa
>ENST00000615943.1 loc:chr22|10736171-10736283|- exons:10736171-10736283 segs:1-113
ATCACTTCTCGGCCTTTTGGCTAAGATCAACTGTAGTATCTGTTGTTATTAATATAATATTGTATATTCA
ACCAATTGTCAATACAAGGCTGTTTGTATCTGATATGAACCAA
>ENST00000618365.1 loc:chr22|10936023-10936161|- exons:10936023-10936161 segs:1-139
AGCATGCCCAGTTAATTTGAAATTTCAGATAAACAAATACTTTTTTCAGTGTAAGTATATCCCATACAAT
ATTTGGGACATGCTTATACTAAAATATTATTCCTTATTTATCTGAAATTGAAATTTAACTGGGTATTAC

Question 5:

Print the last eight lines of 22_transcriptome.fa.

Solution

tail -n 8 22_transcriptome.fa
>ENST00000427528.1 loc:chr22|50798655-50799123|+ exons:50798655-50799123 segs:1-469
ATGGCACCAAAAGCGAAGGAAGCTCCTGCTCATCCTAAAGCCGAAGCCAAAGCGAAGGCTTTAAAGGCCA
AGAAGGCAGTGTTGAAAGGTGTCCGCAGCCACACGCAAAAAAGAAGATCCGCATGTCACTCACCTTCAGG
CGGCCCAAGACACTGCGACTCCGGAGGCAGCCCAGATATCCTCGGAAGAGCACCCCCAGGAGAAACAAGC
TTGGCCACTATGCTATCATCAAGTTTCCGCTGGCCACTGAGTCGGCCGTGAAGAAGATAGAAGAAAACAA
CACGCTTGTGTTCACTGTGGATGTTAAAGCCAACAAGCACCAGATCAGACAGGCTGTGAAGAAGCTCTAT
GACAGTGATGTGGCCAAGGTCACCACCCTGATTTGTCCTGATAAGGAGAACAAGGCATATGTTCGACTTG
CTCCTGATTATGATGCTTTCGATGTTGTAACAAAATTGGGATCACCTAA

Question 6:

Can you find the transcript ENST00000615943.1 in the file 22.gtf? What is the name of the gene in which it is derived?

Solution

grep ENST00000615943.1 22.gtf

This transcript comes from the gene U2, which codes for a snRNA