Skip to content

This page uses content directly from the Biostars Handbook by Istvan Albert (https://www.biostarhandbook.com).

Always remember to load the bioinformatics environment.

conda activate bioinfo

SAM files

SAM format is TAB-delimited, line-oriented, human-readable text format with a 1. Header section - with metadata on each line 2. Alignment section - each line provides alignment information

SAM format specification on Github

SAM files are used to store alignments in a standardized efficient format that allows quick access to the alignments based on coordinates.

Decoding SAM flags (Picard) - use this utility to identify the properties of a read based on SAM flag values, or to find out what SAM flag value would be given a combination of properties.

BAM files

BAM files are a binary, compressed information, machine-readable representation of the SAM format.

BAM files are sorted by alignment coordinate (or read names) for quick accession.

BAM files are created from SAM files. You may be able to download them directly from some data sites or create them yourself.

Tools to manipulate BAM files include: 1. samtools 2. bamtools 3. picard

In BAM files, may be looking for:

alignments that match an attribute such as strand, mate or mapping quality or - alignments within a certain region of the genome


Creating SAM and BAM files.

#SAM files are created from alignment programs such as bowtie2 and bwa.
bwa mem reference_sequence sequence_1.fastq sequence_2.fastq > alignement.sam

#Convert SAM to sorted BAM with samtools.
samtools sort alignment.sam > alignment.bam

#Index the BAM file with samtools.
samtools index alignment.bam

How to extract a section of the BAM file?

Using data we downloaded previously:

bwa mem refs/AF086833.fa SRR1972739_1.fastq SRR1972739_2.fastq  > SRR1972739.bwa.sam
samtools view -S -b SRR1972739.bwa.sam > SRR1972739.bwa.bam
samtools sort SRR1972739.bwa.bam -o sorted_SRR1972739.bwa.bam
samtools index sorted_SRR1972739.bwa.bam
samtools view -b sorted_SRR1972739.bwa.bam AF086833:3050-3199 > selected.bam
samtools index selected.bam
1. Use "bwa mem" to do the alignment of 2 fastq files (forward and reverse reads) to the reference (fasta) sequence. Output is a "sam" file. 2. use samtools view to convert the "sam" (human readable) file to a "bam" (machine readable) file. 3. Use "samtools sort" to sort the "bam" file before indexing. 4. Index the sorted "bam" file with "samtools index". 5. Use "samtools view" to select a portion of the genome to view based on genome co-ordinates. 6. Index the selected interval with "samtools index" to create the ".bai" file needed by IGV.

Load in IGV and view intervals.

Screen Shot 2020-07-21 at 2 23 03 PM


Select from or filter data from BAM files

  1. Selecting means to keep alignments that match a condition.
  2. Filtering means to remove alignments that match a condition.

Filtering on flags can be done via samtools by passing the "-f" and "-F" parameters.

-f flag (include only alignments where bits match the flag)

-F flag (include only alignments where bits DO NOT match the flag)

samtools flags 4

#0x4 4 UNMAP
#therefore when flag 4 is set, the read is **unmapped/unaligned**

View alignments where read did not align. Then count them.

samtools view -f 4 SRR1972739.bwa.bam | head
samtools view -c -f 4 SRR1972739.bwa.bam

#5461
-f 4 means unmapped reads

read unmapped (0x4)

-c -f 4 is counting alignments with the property/condition (-c) that the reads are unmapped (unaligned).

Now we can reverse the flag (from -f to -F) and view the number of alignments.

samtools view -c -F 4 SRR1972739.bwa.bam

#15279

To select forward or reverse alignments.

#filter out unmap (4) and reverse (16)
samtools view -F 20 -b SRR1972739.bwa.bam > selected.bam 
samtools index selected.bam
Screen Shot 2020-07-21 at 2 54 52 PM

To select reverse alignments.

samtools view -F 4 -f 16 -b SRR1972739.bwa.bam > reverse_selected.bam
samtools index reverse_selected.bam
-b option sets output in the BAM format


To get an overview of alignments in a BAM file

samtools flagstat SRR1972739.bwa.bam

produces this

20740 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
740 + 0 supplementary
0 + 0 duplicates
15279 + 0 mapped (73.67% : N/A)
20000 + 0 paired in sequencing
10000 + 0 read1
10000 + 0 read2
14480 + 0 properly paired (72.40% : N/A)
14528 + 0 with itself and mate mapped
11 + 0 singletons (0.05% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
To see it using "bamstats"
bamtools stats -in SRR1972739.bwa.bam

**********************************************
Stats for BAM file(s): 
**********************************************

Total reads:       20740
Mapped reads:      15279    (73.6692%)
Forward strand:    14393    (69.3973%)
Reverse strand:    6347 (30.6027%)
Failed QC:         0    (0%)
Duplicates:        0    (0%)
Paired-end reads:  20740    (100%)
'Proper-pairs':    15216    (73.3655%)
Both pairs mapped: 15268    (73.6162%)
Read 1:            10357
Read 2:            10383
Singletons:        11   (0.0530376%)

What is a proper-pair?

A proper (or concordant) pair is defined as "each segment properly aligned according to the aligner", meaning that the read pair aligns in a expected manner where the reads are oriented towards one another and the distance between the outer edges is within expected ranges.

Types of Alignments

  1. Primary (representative) - represents the "best(?)" alignment.
  2. Secondary - a read that produces multiple alignments in the genome. This is caused primarily by repeats.
  3. Supplementary, or chimeric alignment - an alignment where the read partially matches different regions of the genome without overlapping the same alignment.

Each read will have one primary alignment and other secondary and supplemental alignments.

To select primary alignments (there is no flag for primary alignments, so you must subtract out the secondary and supplementary alignments).

Use "samtools flags" to find the flags for secondary and supplementary reads. Or check out "Decoding SAM flags" at https://broadinstitute.github.io/picard/explain-flags.html

samtools flags SUPPLEMENTARY, SECONDARY
#256    0x100   SECONDARY     .. secondary alignment
#2048   0x800   SUPPLEMENTARY .. supplementary alignment

samtools view -c -F 4 -F 2304 SRR1972739.bwa.bam > output.bam