Lesson 5: Practice questions

For these practice questions, check the present working directory and if needed, change into the /data/username folder (username is the student account ID).

Question 1

Make a directory called SRP475677 and change into it.

Solution

mkdir SRP475677

cd SRP475677

Question 2

Submit a swarm script (name it SRP475677.swarm) to download the first 1000 sequences for the following accessions from SRA. Paired end mode was used.

SRR27044727
SRR27044728
SRR27044729
SRR27044733
SRR27044734

Solution

nano SRP475677.swarm

#SWARM --job-name SRP475677
#SWARM --sbatch "--mail-type=ALL --mail-user=username@nih.gov"
#SWARM --partition=student
#SWARM --gres=lscratch:15 
#SWARM --module sratoolkit 

fastq-dump --split-files -X 1000 SRR27044727
fastq-dump --split-files -X 1000 SRR27044728
fastq-dump --split-files -X 1000 SRR27044729
fastq-dump --split-files -X 1000 SRR27044733
fastq-dump --split-files -X 1000 SRR27044734

Hit control-x and then "y" to save and exit nano.

swarm -f SRP475677.swarm

Question 3

Submit a shell script (name it SRP475677.sh) to run seqkit stats for the FASTQ files that were just downloaded.

Solution

nano SRP475677.sh

#!/bin/bash
#SBATCH --job-name=SRP475677_stats
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@nih.gov
#SBATCH --mem=1gb
#SBATCH --partition=student
#SBATCH --time=00:02:00
#SBATCH --output=SRP475677_stats_log

#LOAD REQUIRED MODULES
module load seqkit

#CREATE TEXT FILE TO STORE THE seqkit stat OUTPUT
touch SRP475677_stats.txt

#CREATE A FOR LOOP TO LOOP THROUGH THE FASTQ FILES AND GENERATE STATISTICS
#Use ">>" to redirect and append output to a file
for file in *.fastq;
do seqkit stat $file >> SRP475677_stats.txt;
done

sbatch SRP475677.sh

Question 4

What command is used to view the text file containing the seqkit stats results for the FASTQ files downloaded?

Solution

cat SRP475677_stats.txt

Question 5

How can the shell script in Question 3 be changed to obtain FASTQC results?

Solution

nano SRP475677_fastqc.sh

#!/bin/bash
#SBATCH --job-name=SRP475677_fastqc
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@nih.gov
#SBATCH --mem=1gb
#SBATCH --partition=student
#SBATCH --time=01:00:00
#SBATCH --output=SRP475677_fastqc_log

#LOAD REQUIRED MODULES
module load fastqc

#CREATE A FOR LOOP TO LOOP THROUGH THE FASTQ FILES AND RUN FASTQC

for file in *.fastq;
do fastqc $file;
done

sbatch SRP475677_fastqc.sh