Skip to content

Lesson 5 Practice

The following can be used to practice skills learned in Lesson 5.

Login to Biowulf

If you are already logged in, exit the remote connection and reconnect. Remember, you must be on the NIH network.

Solution

ssh username@biowulf.nih.gov      

Let's run fastqc, a quality control program, on the files we downloaded from the SRA.

Using sbatch

Start a new script, named fastqc.sh in the same directory in which you downloaded data from Lesson 5.

The command you will include in the script is as follows:

mkdir fastqc
fastqc -o ./fastqc/ -t 4 *.fastq

This command will output the fastqc results to a directory named fastqc inside the current working directory. It will also run using four threads and will run on all fastq files in your working directory.

You need edit this script in order to submit as a job on Biowulf. What is missing?

Solution

#!/bin/bash
#SBATCH --cpus-per-task=4 

module load fastqc
mkdir fastqc
fastqc -o ./fastqc/ -t 4 *.fastq

Submit the job.

Solution

sbatch fastqc.sh

How can we check on our job? What is the job's status? How much memory is it using?

Solution

squeue -u $USER  
sjobs -u $USER  

How can we cancel this job?

Solution

scancel job-id  
where job-id is the id of the job. Check the output of squeue -u $USER if you are unsure what the job id is.

Accessing the Biostars module on Biowulf

We have created a Biostars module on Biowulf for your convenience. Instructions for using this module are here, and the software included in the module are listed here.

Let's get an interactive session and see how we can use the module.

sinteractive  

Also, we have created a script to set up your shell and load the module, use

source /data/classes/BTEP/apps/biostars/1.0/run_biostars.sh  

This creates a $DATA environment variable holding the path to many of the files from class. We try to update this, but please let us know if something is missing.

ls $DATA  

Let's try a command.

fastqc -h  

Additional practice materials from hpc.nih.gov