Skip to content

Lesson 7: Course Wrap-Up

Learning Objectives

  1. Introduce the QIIME2 microbiome workflow for Biowulf
  2. Review key concepts
  3. Showcase additional plugins

QIIME 2 on Biowulf

As mentioned previously, QIIME 2 is installed on Biowulf.

To see available versions use

module avail qiime  
Also, check out the QIIME2 Biowulf help page.

The default version on Biowulf is qiime2-2021.4, and the latest installed version is qiime2-2022.2.

If you are interested in a reproducible workflow to use on Biowulf, Samantha Chill, a bioinformatician with CCBR, created a workflow that is readily available from github.

Review


Image adapted from docs.qiime2.org (Conceptual Overview of QIIME 2) .

What have we done?

Using a small subset of data:

  1. Imported raw fastq files using qiime tools import. Data was paired-end CASAVA format.
  2. Checked for primers using qiime cutadapt trim-paired.
  3. Denoised with qiime dada2 denoise-paired and generated summaries of our feature table and representative sequences.

Using the larger data set:

  1. Filtered samples and features based on metadata categories and other thresholds.
  2. Classified our sequences using a Greengenes trained (V4-V5) classifier and qiime feature-classifier classify-sklearn
  3. Applied taxonomic filtering
  4. Generated a de novo phylogenetic tree using qiime phylogeny align-to-tree-mafft-fasttree
  5. Chose a rarefaction depth using the qiime 2 feature table summary and rarefaction curve
  6. Generated several core alpha and beta diversity metrics and visualizations

These are our core steps, but let's also take a look at some of the other analysis plugins and methods available in QIIME 2.

Other plugins of interest

Differential abundance testing

Differential abundance testing examines which taxa are significantly different in abundance between conditions. However, challenges such as sparsity, compositionality, and library size differences make this challenging to determine.

Methods in QIIME 2

ANCOM (Analysis of Composition of Microbiomes)

  • additive log ratio approach
  • assumes that less than 25 % of features change between groups
  • q2-composition plugin
  • Need to filter rare taxa
  • w-statistic - the number of null hypotheses rejected

See the Moving Pictures tutorial.

gneiss

ALDEx2

Note: Different methods produce different results, and methods are sensitive to upstream processing decisions. ALDEx2 and ANCOM-II were found to be more conservative but less sensitive than other methods, which means they are less prone to false positives.

ANCOM example

Let's use our practice data set to run ANCOM.

Step 1: Filter out low abundance / low prevalent ASVs. Note: this will shift the composition of the samples, and thus could bias results.

mkdir ancom  

qiime feature-table filter-features \
  --i-table /data/practice/04_filter/filtered-table3.qza \
  --p-min-frequency 50 \
  --p-min-samples 2 \
  --o-filtered-table ancom/ancomfilt.qza  

Step 2: Add pseudo-counts - This method does not tolerate zeros.

qiime composition add-pseudocount \
  --i-table ancom/ancomfilt.qza \
  --o-composition-table ancom/comp-table.qza  

The only metadata category of interest in the data set is DataType, old vs young. Now, let's run ancom.

qiime composition ancom \
  --i-table ancom/comp-table.qza \
  --m-metadata-file /data/practice/metadata.txt \
  --m-metadata-column DataType \
  --o-visualization ancom/ancom-OY.qzv  

Core microbiome

If interested in highly prevalent taxa, you could use qiime feature-table core-features, which identifies "features observed in a user-defined fraction of the samples." By default, this will return features observed in at least 50% of samples.

Random forest regression and classification

Can we use microbial community composition to predict a condition? For example, maybe we are interested in whether microbial community composition can predict a cancer state from a non-cancer state.

In QIIME 2, we could use the q2-sample-classifier, which uses supervised learning (default = Random Forest classification).

Supervised learning classifiers predict the categorical metadata classes of unlabeled samples by learning the composition of labeled training samples. --- https://docs.qiime2.org/2022.8/tutorials/sample-classifier/.

Random Forest example

Let's use our practice data set again, and see if we can predict group membership (old vs young) by microbial composition. We will use the sample-classifier pipeline. This pipeline splits our data into training and testing sets, trains the model using the --p-estimator of choice, performs k-fold cross-validation (5 by default), tests the model on the test set, and calculates model accuracy by comparing true values versus predicted values of the test set.

qiime sample-classifier classify-samples \
  --i-table /data/practice/04_filter/filtered-table3.qza \
  --m-metadata-file /data/practice/metadata.txt \
  --m-metadata-column DataType \
  --p-optimize-feature-selection \
  --p-parameter-tuning \
  --p-estimator RandomForestClassifier \
  --p-random-state 123 \
  --output-dir rforest

We can move our visualizations to ~/public to view some of these outputs.

Other notable plugins

DEICODE

  • compositional beta diversity with biplots
  • performs a Robust Aitchison PCA

q2-clawback

  • can improve taxonomic classifications
  • uses taxonomic weights based on environment

q2-picrust2

  • functional prediction from 16S rRNA data

q2-sidle

provenance_lib

  • provenance replay
  • generate reproducible code based on your QIIME 2 inputs and outputs

Exporting results

Many of the QIIME 2 visualizations allow you to directly download results in a tab-delimited format. You can also unzip any QIIME 2 visualization (.qzv) or QIIME 2 artifact (.qza) and access data in the data directory.

There is also an export method in QIIME 2 (qiime tools export). Let's export a feature table.

First, let's simply unzip a feature table artifact.

unzip -d filtered-table3 filtered-table-3.qza  

Now, let's use qiime tools export.

qiime tools export \
  --input-path filtered-table-3.qza \
  --output-path exported-feature-table-3

Working in R

There are many packages available to work with microbiome data in R. While there is an R API in the works for QIIME 2, for now, users can use the R package, qiime2R, to easily import QIIME 2 files.

Phyloseq is a fantastic Bioconductor package for microbiome analysis with R, and qiime2R can import QIIME 2 files as phyloseq objects.

Struggling with command line?

  1. Try QIIME 2 Galaxy implementation
  2. Nephele
  3. MicrobiomeAnalyst

If you have any questions about your microbiome analysis, do not hesitate to email us at ncibtep@nih.gov.