Lesson 7: Help session
Lesson recap
This lesson has taught us how to download data from the web in Unix. We are also able to view file content and to search for patterns in files.
Practice questions
Question 1:
Make a folder in your data directory called lesson7_practice and change into it.
Solution
mkdir lesson7_practice
cd lesson7_practice
Question 2:
Next, goto the course data section of the class documentation. Download 22_transcriptome.fa into the lesson7_practice folder using wget
. This file contains sequences corresponding to the transcripts found in human chromosome 22.
Solution
wget https://btep.ccr.cancer.gov/docs/unix-on-biowulf-2023/data/22_transcriptome.fa
Question 3:
Next, goto the course data section of the class documentation. Download 22.gtf into the lesson7_practice folder using curl
. This is the genomic annotation file for human chromosome 22, which tells us where features such as genes, transcripts, exons, and coding sequences are found along a genome.
Solution
curl https://btep.ccr.cancer.gov/docs/unix-on-biowulf-2023/data/22.gtf -o 22.gtf
OR
curl -O https://btep.ccr.cancer.gov/docs/unix-on-biowulf-2023/data/22.gtf
Question 4:
Print the first six lines of 22_transcriptome.fa.
Solution
head -n 6 22_transcriptome.fa
>ENST00000615943.1 loc:chr22|10736171-10736283|- exons:10736171-10736283 segs:1-113
ATCACTTCTCGGCCTTTTGGCTAAGATCAACTGTAGTATCTGTTGTTATTAATATAATATTGTATATTCA
ACCAATTGTCAATACAAGGCTGTTTGTATCTGATATGAACCAA
>ENST00000618365.1 loc:chr22|10936023-10936161|- exons:10936023-10936161 segs:1-139
AGCATGCCCAGTTAATTTGAAATTTCAGATAAACAAATACTTTTTTCAGTGTAAGTATATCCCATACAAT
ATTTGGGACATGCTTATACTAAAATATTATTCCTTATTTATCTGAAATTGAAATTTAACTGGGTATTAC
Question 5:
Print the last eight lines of 22_transcriptome.fa.
Solution
tail -n 8 22_transcriptome.fa
>ENST00000427528.1 loc:chr22|50798655-50799123|+ exons:50798655-50799123 segs:1-469
ATGGCACCAAAAGCGAAGGAAGCTCCTGCTCATCCTAAAGCCGAAGCCAAAGCGAAGGCTTTAAAGGCCA
AGAAGGCAGTGTTGAAAGGTGTCCGCAGCCACACGCAAAAAAGAAGATCCGCATGTCACTCACCTTCAGG
CGGCCCAAGACACTGCGACTCCGGAGGCAGCCCAGATATCCTCGGAAGAGCACCCCCAGGAGAAACAAGC
TTGGCCACTATGCTATCATCAAGTTTCCGCTGGCCACTGAGTCGGCCGTGAAGAAGATAGAAGAAAACAA
CACGCTTGTGTTCACTGTGGATGTTAAAGCCAACAAGCACCAGATCAGACAGGCTGTGAAGAAGCTCTAT
GACAGTGATGTGGCCAAGGTCACCACCCTGATTTGTCCTGATAAGGAGAACAAGGCATATGTTCGACTTG
CTCCTGATTATGATGCTTTCGATGTTGTAACAAAATTGGGATCACCTAA
Question 6:
Can you find the transcript ENST00000615943.1 in the file 22.gtf? What is the name of the gene in which it is derived?
Solution
grep ENST00000615943.1 22.gtf
This transcript comes from the gene U2, which codes for a snRNA