20. Multiple Sequence Aligners copy
This page uses content directly from the Biostars Handbook by Istvan Albert.
Always remember to activate the bioinformatics environment.
conda activate bioinfo
How to align more than two sequences?
Let's download the Ebola genomes.
mkdir -p ebola
esearch -db nuccore -query PRJNA257197 | efetch -format fasta > genomes/ebola.fa
seqkit stat genomes/ebola.fa
file format type num_seqs sum_len min_len avg_len max_len
genomes/ebola.fa FASTA DNA 249 4,705,429 18,613 18,897.3 18,959
seqkit seq -n genomes/ebola.fa | cut -f 1 -d ' ' | head -10 > ids.txt
less ids.txt
KR105345.1
KR105328.1
KR105323.1
KR105302.1
KR105295.1
KR105294.1
KR105282.1
KR105266.1
KR105263.1
KR105253.1
seqkit grep --pattern-file ids.txt genomes/ebola.fa > small.fa
Let's see what we got.
less small.fa | wc -l
less genomes/ebola.fa | wc -l
Need to install mafft.
Get the install packages here:
https://mafft.cbrc.jp/alignment/software/source.html
Run the mafft program on the small file of Ebola genomes we created.
mafft --clustalout small.fa > alignment.maf
head -20 alignment.maf
tail -20 alignment.maf