19. Short Read Aligners copy
This page uses material directly from the Biostar Handbook by Istvan Albert.
Always remember to activate the bioinformatics environment.
conda activate bioinfo
To align sequences to a genome, we need (1) genome sequence and the (2) sequence reads.
Let's start by creating a new directory for our work and downloading sequence data for the Ebola genome.
mkdir -p refs
efetch -db nuccore -format fasta -id AF086833 > refs/AF086833.fa
bwa index refs/AF086833.fa
We'll need some sequence data files to align to this genome. (Don't put them in the refs directory).
First, we can find data using the "PRJNA", or project number.
esearch -db sra -query PRJNA257197 | efetch -format runinfo > runinfo.csv
We'll pick one run from this file and download just 10K reads.
Remember to use "--split-files" to get both the forward and reverse reads in separate files.
fastq-dump -X 10000 --split-files SRR1972739
The bwa program can be run in either single-end or paired-end mode by specifying at the command line.
For single-end mode:
bwa mem refs/AF086833.fa SRR1972739_1.fastq > SRR1972739.sam
bwa mem refs/AF086833.fa refs/SRR1972739_1.fastq refs/SRR1972739_2.fastq > bwa.sam
samtools view -S -b bwa.sam > bwa.bam
samtools sort bwa.bam -o sorted_bwa.bam
samtools index sorted_bwa.bam
Summary of what we did:
- Downloaded Ebola genome in FASTA format (.fa). (AF086833.fa)
- Created an index file (.fai) for the Ebola genome using bwa index.
- Downloaded RNA-Seq reads in FASTQ (.fastq) format. (SRR1972739_1.fastq, SRR1972739_2.fastq)
- Use bwa mem to align the Ebola genome (.fa) and RNA-Seq reads (.fastq), output to SAM (.sam) file.
- Convert SAM (.sam) to BAM (.bam), a binary version of SAM (.sam).
- Sort the BAM (.bam) files into genome order.
- Create a BAM (.bam) index file (.bai), using samtools index.
- Load the Ebola genome (AF086833.fa) into IGV (Genomes - Load Genome From File).
- Load the BAM (.bam) file into IGV using the "Load File" option (sorted_output.bam).
To use bowtie2 for this analysis, please see Using bowtie2 to align sequence reads