Lesson 5 Practice
The following can be used to practice skills learned in Lesson 5.
Login to Biowulf
If you are already logged in, exit the remote connection and reconnect. Remember, you must be on the NIH network.
Solution
ssh username@biowulf.nih.gov
Let's run fastqc
, a quality control program, on the files we downloaded from the SRA.
Using sbatch
Start a new script, named fastqc.sh
in the same directory in which you downloaded data from Lesson 5.
The command you will include in the script is as follows:
mkdir fastqc
fastqc -o ./fastqc/ -t 4 *.fastq
This command will output the fastqc
results to a directory named fastqc
inside the current working directory. It will also run using four threads and will run on all fastq files in your working directory.
You need edit this script in order to submit as a job on Biowulf. What is missing?
Solution
#!/bin/bash
#SBATCH --cpus-per-task=4
module load fastqc
mkdir fastqc
fastqc -o ./fastqc/ -t 4 *.fastq
Submit the job.
Solution
sbatch fastqc.sh
How can we check on our job? What is the job's status? How much memory is it using?
Solution
squeue -u $USER
sjobs -u $USER
How can we cancel this job?
Solution
scancel job-id
job-id
is the id of the job. Check the output of squeue -u $USER
if you are unsure what the job id is.
Accessing the Biostars module on Biowulf
We have created a Biostars module on Biowulf for your convenience. Instructions for using this module are here, and the software included in the module are listed here.
Let's get an interactive session and see how we can use the module.
sinteractive
Also, we have created a script to set up your shell and load the module, use
source /data/classes/BTEP/apps/biostars/1.0/run_biostars.sh
This creates a $DATA environment variable holding the path to many of the files from class. We try to update this, but please let us know if something is missing.
ls $DATA
Let's try a command.
fastqc -h