Lesson 5: Practice questions
For these practice questions, check the present working directory and if needed, change into the /data/username folder (username is the student account ID).
Question 1
Make a directory called SRP475677 and change into it.
Solution
mkdir SRP475677
cd SRP475677
Question 2
Submit a swarm script (name it SRP475677.swarm) to download the first 1000 sequences for the following accessions from SRA. Paired end mode was used.
- SRR27044727
- SRR27044728
- SRR27044729
- SRR27044733
- SRR27044734
Solution
nano SRP475677.swarm
#SWARM --job-name SRP475677
#SWARM --sbatch "--mail-type=ALL --mail-user=username@nih.gov"
#SWARM --partition=student
#SWARM --gres=lscratch:15
#SWARM --module sratoolkit
fastq-dump --split-files -X 1000 SRR27044727
fastq-dump --split-files -X 1000 SRR27044728
fastq-dump --split-files -X 1000 SRR27044729
fastq-dump --split-files -X 1000 SRR27044733
fastq-dump --split-files -X 1000 SRR27044734
Hit control-x and then "y" to save and exit nano.
swarm -f SRP475677.swarm
Question 3
Submit a shell script (name it SRP475677.sh) to run seqkit stats
for the FASTQ files that were just downloaded.
Solution
nano SRP475677.sh
#!/bin/bash
#SBATCH --job-name=SRP475677_stats
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@nih.gov
#SBATCH --mem=1gb
#SBATCH --partition=student
#SBATCH --time=00:02:00
#SBATCH --output=SRP475677_stats_log
#LOAD REQUIRED MODULES
module load seqkit
#CREATE TEXT FILE TO STORE THE seqkit stat OUTPUT
touch SRP475677_stats.txt
#CREATE A FOR LOOP TO LOOP THROUGH THE FASTQ FILES AND GENERATE STATISTICS
#Use ">>" to redirect and append output to a file
for file in *.fastq;
do seqkit stat $file >> SRP475677_stats.txt;
done
sbatch SRP475677.sh
Question 4
What command is used to view the text file containing the seqkit stats
results for the FASTQ files downloaded?
Solution
cat SRP475677_stats.txt
Question 5
How can the shell script in Question 3 be changed to obtain FASTQC results?
Solution
nano SRP475677_fastqc.sh
#!/bin/bash
#SBATCH --job-name=SRP475677_fastqc
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@nih.gov
#SBATCH --mem=1gb
#SBATCH --partition=student
#SBATCH --time=01:00:00
#SBATCH --output=SRP475677_fastqc_log
#LOAD REQUIRED MODULES
module load fastqc
#CREATE A FOR LOOP TO LOOP THROUGH THE FASTQ FILES AND RUN FASTQC
for file in *.fastq;
do fastqc $file;
done
sbatch SRP475677_fastqc.sh