Lesson 6 .

Learning Objectives

Introduce several beta diversity metrics
Discover different ordination methods
Learn about statistical methods that are applicable

Beta diversity

Beta diversity is between sample diversity. This is useful for answering the question, how different are these microbial communities?

Image modified from https://www.genome.gov/genetics-glossary/Microbiome

Beta diversity is measured using distance and dissimilarity metrics. The core-metrics-phylogenetic pipeline automatically produces Bray-Curtis, Jaccard, weighted UniFrac, and unweighted UniFrac. More on these below.

Distance and dissimilarity metrics

Bray-Curtis dissimilarity

quantitative
Takes into consideration abundance and presence absence

Jaccard
- qualitative - presence / absence - percentage of taxa not found in both samples

Weighted UniFrac

quantitative
similar to Bray-Curtis but takes into consideration phylogenetic relationships

Unweighted UniFrac

qualitative
like Jaccard focuses on presence / absence of taxa but also includes phylogenetic relationships
percentage of phylogenetic branch length not found in both samples

Aitchison

an answer to the compositional nature of the data
"euclidean distances between clr-transformed compositions" (Quinn et al. 2018).
a clr transformation sets the features in a data set relative to the geometric mean of the composition

What is compositional data?

Compositional data have two unique properties. First, the total sum of all component values (i.e. the library size) is an artifact of the sampling procedure (van den Boogaart and Tolosana-Delgado, 2008). Second, the difference between component values is only meaningful proportionally [e.g. the difference between 100 and 200 counts carries the same information as the difference between 1000 and 2000 counts (van den Boogaart and Tolosana-Delgado, 2008)].--- Quinn et al. 2018

See this paper for more information on compositional data.

Some other notable metrics are described here.

These methods result in large distance / dissimilarity matrices. In all methods, a value closer to zero indicates similarity between microbial communities, while a value closer to one indicates dissimilarity.

Beta rarefaction

Again, rarefaction is used to eliminate issues due to differences in library size prior to beta diversity. This method is built-in to QIIME 2 core metrics pipelines. We can examine the stability of a beta diversity metric using qiime diversity beta-rarefaction.

qiime diversity beta-rarefaction \
  --i-table filtered-table-3.qza \
  --p-metric braycurtis \
  --p-clustering-method nj \
  --p-sampling-depth 10000 \
  --m-metadata-file /data/sample-metadata.tsv \
  --o-visualization braycurtis-rarefaction-plot.qzv

This will rarefy your feature table multiple times at a given depth. The output provides a jacknifed emperor plot, with variability around a community represented by the ellipsoids around a point. A correlation heatmap and a UPGMA/NJ sample-clustering tree is also output.

Ordination methods

Methods to reduce dimensionality in the data and visualize trends in the data. The following list includes commonly used methods and is not exhaustive.

PCoA

most common
similar to PCA but works on distance metrics beyond euclidean
maximizes linear correlation
prone to the horseshoe effect (also observed in PCA)

UMAP (Uniform Manifold Approximation and Projection)

non-linear
can be used on multiple distance / dissimilarity metrics
improved resolution in clusters
More information here.

NMDS (Not available in QIIME 2)
- better for rank ordered data (e.g., Bray-Curtis)
- dimensions are specified
- stress indicates how well the ordination represents the data (stress < 0.1 ~ good)
- no single solution

See this resource for more information on ordination metrics.

Generating a PCoA and UMAP in QIIME2

PCoA

PCoA was included by default in our core-metrics-phylogenetic pipeline. Because these are longitudinal data, we will customize the axis to include the varaible, week-relative-to-hct.

qiime emperor plot \
  --i-pcoa diversity-core-metrics-phylogenetic/unweighted_unifrac_pcoa_results.qza \
  --m-metadata-file /data/sample-metadata.tsv diversity-core-metrics-phylogenetic/faith_pd_vector.qza diversity-core-metrics-phylogenetic/evenness_vector.qza diversity-core-metrics-phylogenetic/shannon_vector.qza \
  --p-custom-axes week-relative-to-hct \
  --o-visualization uu-pcoa-emperor-w-time.qzv
qiime emperor plot \
  --i-pcoa diversity-core-metrics-phylogenetic/weighted_unifrac_pcoa_results.qza \
  --m-metadata-file /data/sample-metadata.tsv diversity-core-metrics-phylogenetic/faith_pd_vector.qza diversity-core-metrics-phylogenetic/evenness_vector.qza diversity-core-metrics-phylogenetic/shannon_vector.qza \
  --p-custom-axes week-relative-to-hct \
  --o-visualization wu-pcoa-emperor-w-time.qzv

UMAP

First, we perform the ordination.

qiime diversity umap \
  --i-distance-matrix diversity-core-metrics-phylogenetic/unweighted_unifrac_distance_matrix.qza \
  --o-umap uu-umap.qza
qiime diversity umap \
  --i-distance-matrix diversity-core-metrics-phylogenetic/weighted_unifrac_distance_matrix.qza \
  --o-umap wu-umap.qza

Then we use emperor to plot. Though the input parameter is --i-pcoa, we can also input umap results.

qiime emperor plot \
  --i-pcoa uu-umap.qza \
  --m-metadata-file /data/sample-metadata.tsv diversity-core-metrics-phylogenetic/faith_pd_vector.qza diversity-core-metrics-phylogenetic/evenness_vector.qza diversity-core-metrics-phylogenetic/shannon_vector.qza \
  --p-custom-axes week-relative-to-hct \
  --o-visualization uu-umap-emperor-w-time.qzv
qiime emperor plot \
  --i-pcoa wu-umap.qza \
  --m-metadata-file /data/sample-metadata.tsv diversity-core-metrics-phylogenetic/faith_pd_vector.qza diversity-core-metrics-phylogenetic/evenness_vector.qza diversity-core-metrics-phylogenetic/shannon_vector.qza \
  --p-custom-axes week-relative-to-hct \
  --o-visualization wu-umap-emperor-w-time.qzv

When we view these files, there are many options for customizing our plot and toggling our view. Let's look at these in more detail.

UMAP (unweighted UniFrac): A 2-D representation of axis-1 vs week-relative-to-hct. We can toggle our view to see this in 3D.

Longitudinal trends are difficult to view here because the data are dependent, but these can be teased apart in greater detail using the q2-longitudinal plugin.

Let's take a look at the moving pictures data set for a clearer example.

Statistics

Some typical statitstical tests applied to beta diversity metrics include the following:

Adonis (PERMANOVA)

Similar to a MANOVA, but is permutational and non-parametric.
Sensitive to group dispersion, so it is worth running alongside a beta-dispersion method.
generates a pseudo-F ratio; larger pseudo-F suggests larger group separation.
requires data independence

ANOSIM (Analysis of Similarity)

uses a ranked approach (complementary to NMDS)
The ANOSIM statistic compares the mean of ranked dissimilarities between groups to the mean of ranked dissimilarities within groups. An R value close to "1.0" suggests dissimilarity between groups while an R value close to "0" suggests an even distribution of high and low ranks within and between groups. R values below "0" suggest that dissimilarities are greater within groups than between groups. --- gustame documentation
Also sensitive to differences in group dispersion.

These methods, including a permutational dispersion test, can be run using qiime diversity beta-group-significance. The PERMANOVA implementation here is one-way. To include more than one variable with potential interactions, use qiime diversity adonis.

Again, because we are looking at longitudinal data, these are not as relevant in this specific case.