Skip to content

Lesson 7 Practice

In Lesson 7, you learned how to download and work with archived and compressed files. To practice what you have learned, we will use the ERCC spike in control data, which Istvan Albert, creator of the Biostar Handbook, has reframed as the "Golden Snidget", "a magical golden bird with fully rotational wings."

Task 1: Create a new directory

Create a directory called golden and change directories.

Solution

mkdir golden  
cd golden

Task 2: Download the "Golden Snidget" reference data

Now, grab the reference files from http://data.biostarhandbook.com/books/rnaseq/data/golden.genome.tar.gz. This time try out wget. If you aren't sure how to use wget, how might you find out?

Solution

wget -nc http://data.biostarhandbook.com/books/rnaseq/data/golden.genome.tar.gz

What does the -nc flag do?

man wget
or
wget --help  
-nc stands for "no clobber", which keeps wget from downloading and overwriting an existing file of the same name.

Unpack the reference genome (golden.genome.tar.gz).

Solution

tar -xvf golden.genome.tar.gz  

What did this produce?

Just for fun, let's rearchive and zip the data we just packed, name it funtimes.ref.tar.gz. How might we do this using tar? Check the help information. Alternatively, try google.

Solution

tar --help  
tar -czvf funtimes.ref.tar.gz refs 

What are the file sizes of golden.genome.tar.gz and funtimes.ref.tar.gz?

Task 3: Download the "Golden Snidget" reads

Get the reads from http://data.biostarhandbook.com/books/rnaseq/data/golden.reads.tar.gz. You will also need to unpack the file.

Solution

wget -nc http://data.biostarhandbook.com/books/rnaseq/data/golden.reads.tar.gz

tar -xvf golden.reads.tar.gz  

What did this produce? List the file contents.

Solution

ls -lh  

Lastly, compress the fastq files using gzip.

Solution

gzip reads/*.fq