Lesson 7 Practice
In Lesson 7, you learned how to download and work with archived and compressed files. To practice what you have learned, we will use the ERCC spike in control data, which Istvan Albert, creator of the Biostar Handbook, has reframed as the "Golden Snidget", "a magical golden bird with fully rotational wings."
Task 1: Create a new directory
Create a directory called golden
and change directories.
Solution
mkdir golden
cd golden
Task 2: Download the "Golden Snidget" reference data
Now, grab the reference files from http://data.biostarhandbook.com/books/rnaseq/data/golden.genome.tar.gz
. This time try out wget
. If you aren't sure how to use wget
, how might you find out?
Solution
wget -nc http://data.biostarhandbook.com/books/rnaseq/data/golden.genome.tar.gz
What does the -nc
flag do?
man wget
wget --help
-nc
stands for "no clobber", which keeps wget
from downloading and overwriting an existing file of the same name.
Unpack the reference genome (golden.genome.tar.gz
).
Solution
tar -xvf golden.genome.tar.gz
What did this produce?
Just for fun, let's rearchive and zip the data we just packed, name it funtimes.ref.tar.gz
. How might we do this using tar
? Check the help information. Alternatively, try google.
Solution
tar --help
tar -czvf funtimes.ref.tar.gz refs
What are the file sizes of golden.genome.tar.gz
and funtimes.ref.tar.gz
?
Task 3: Download the "Golden Snidget" reads
Get the reads from http://data.biostarhandbook.com/books/rnaseq/data/golden.reads.tar.gz
. You will also need to unpack the file.
Solution
wget -nc http://data.biostarhandbook.com/books/rnaseq/data/golden.reads.tar.gz
tar -xvf golden.reads.tar.gz
What did this produce? List the file contents.
Solution
ls -lh
Lastly, compress the fastq files using gzip
.
Solution
gzip reads/*.fq