Lesson 7: Course Wrap-Up

Learning Objectives

Introduce the QIIME2 microbiome workflow for Biowulf
Review key concepts
Showcase additional plugins

QIIME 2 on Biowulf

As mentioned previously, QIIME 2 is installed on Biowulf.

To see available versions use

module avail qiime

Also, check out the QIIME2 Biowulf help page.

The default version on Biowulf is qiime2-2021.4, and the latest installed version is qiime2-2022.2.

If you are interested in a reproducible workflow to use on Biowulf, Samantha Chill, a bioinformatician with CCBR, created a workflow that is readily available from github.

Review

Image adapted from docs.qiime2.org (Conceptual Overview of QIIME 2) .

What have we done?

Using a small subset of data:

Imported raw fastq files using qiime tools import. Data was paired-end CASAVA format.
Checked for primers using qiime cutadapt trim-paired.
Denoised with qiime dada2 denoise-paired and generated summaries of our feature table and representative sequences.

Using the larger data set:

Filtered samples and features based on metadata categories and other thresholds.
Classified our sequences using a Greengenes trained (V4-V5) classifier and qiime feature-classifier classify-sklearn
Applied taxonomic filtering
Generated a de novo phylogenetic tree using qiime phylogeny align-to-tree-mafft-fasttree
Chose a rarefaction depth using the qiime 2 feature table summary and rarefaction curve
Generated several core alpha and beta diversity metrics and visualizations

These are our core steps, but let's also take a look at some of the other analysis plugins and methods available in QIIME 2.

Other plugins of interest

Differential abundance testing

Differential abundance testing examines which taxa are significantly different in abundance between conditions. However, challenges such as sparsity, compositionality, and library size differences make this challenging to determine.

Methods in QIIME 2

ANCOM (Analysis of Composition of Microbiomes)

additive log ratio approach
assumes that less than 25 % of features change between groups
q2-composition plugin
Need to filter rare taxa
w-statistic - the number of null hypotheses rejected

See the Moving Pictures tutorial.

gneiss

uses balance trees (isometric log-ratio transformation)
Need to filter rare taxa
Check out this explanation

ALDEx2

center log ratio approach
used for multiple types of NGS data (e.g., RNA-Seq)
tested for QIIME 2 version 2019.7

Note: Different methods produce different results, and methods are sensitive to upstream processing decisions. ALDEx2 and ANCOM-II were found to be more conservative but less sensitive than other methods, which means they are less prone to false positives.

ANCOM example

Let's use our practice data set to run ANCOM.

Step 1: Filter out low abundance / low prevalent ASVs. Note: this will shift the composition of the samples, and thus could bias results.

mkdir ancom  

qiime feature-table filter-features \
  --i-table /data/practice/04_filter/filtered-table3.qza \
  --p-min-frequency 50 \
  --p-min-samples 2 \
  --o-filtered-table ancom/ancomfilt.qza

Step 2: Add pseudo-counts - This method does not tolerate zeros.

qiime composition add-pseudocount \
  --i-table ancom/ancomfilt.qza \
  --o-composition-table ancom/comp-table.qza

The only metadata category of interest in the data set is DataType, old vs young. Now, let's run ancom.

qiime composition ancom \
  --i-table ancom/comp-table.qza \
  --m-metadata-file /data/practice/metadata.txt \
  --m-metadata-column DataType \
  --o-visualization ancom/ancom-OY.qzv

Core microbiome

If interested in highly prevalent taxa, you could use qiime feature-table core-features, which identifies "features observed in a user-defined fraction of the samples." By default, this will return features observed in at least 50% of samples.

Random forest regression and classification

Can we use microbial community composition to predict a condition? For example, maybe we are interested in whether microbial community composition can predict a cancer state from a non-cancer state.

In QIIME 2, we could use the q2-sample-classifier, which uses supervised learning (default = Random Forest classification).

Supervised learning classifiers predict the categorical metadata classes of unlabeled samples by learning the composition of labeled training samples. --- https://docs.qiime2.org/2022.8/tutorials/sample-classifier/.

Random Forest example

Let's use our practice data set again, and see if we can predict group membership (old vs young) by microbial composition. We will use the sample-classifier pipeline. This pipeline splits our data into training and testing sets, trains the model using the --p-estimator of choice, performs k-fold cross-validation (5 by default), tests the model on the test set, and calculates model accuracy by comparing true values versus predicted values of the test set.

qiime sample-classifier classify-samples \
  --i-table /data/practice/04_filter/filtered-table3.qza \
  --m-metadata-file /data/practice/metadata.txt \
  --m-metadata-column DataType \
  --p-optimize-feature-selection \
  --p-parameter-tuning \
  --p-estimator RandomForestClassifier \
  --p-random-state 123 \
  --output-dir rforest

We can move our visualizations to ~/public to view some of these outputs.

Other notable plugins

DEICODE

compositional beta diversity with biplots
performs a Robust Aitchison PCA

q2-clawback

can improve taxonomic classifications
uses taxonomic weights based on environment

q2-picrust2

functional prediction from 16S rRNA data

q2-sidle

a new implementation of the Short MUltiple Regions Framework (SMURF)

provenance_lib

provenance replay
generate reproducible code based on your QIIME 2 inputs and outputs

Exporting results

Many of the QIIME 2 visualizations allow you to directly download results in a tab-delimited format. You can also unzip any QIIME 2 visualization (.qzv) or QIIME 2 artifact (.qza) and access data in the data directory.

There is also an export method in QIIME 2 (qiime tools export). Let's export a feature table.

First, let's simply unzip a feature table artifact.

unzip -d filtered-table3 filtered-table-3.qza

Now, let's use qiime tools export.

qiime tools export \
  --input-path filtered-table-3.qza \
  --output-path exported-feature-table-3

Working in R

There are many packages available to work with microbiome data in R. While there is an R API in the works for QIIME 2, for now, users can use the R package, qiime2R, to easily import QIIME 2 files.

Phyloseq is a fantastic Bioconductor package for microbiome analysis with R, and qiime2R can import QIIME 2 files as phyloseq objects.

Struggling with command line?

Try QIIME 2 Galaxy implementation
Nephele
MicrobiomeAnalyst

If you have any questions about your microbiome analysis, do not hesitate to email us at ncibtep@nih.gov.