--- title: "Creating a Volcano Plot" subtitle: "Volcano Quarto Demonstration" title-block-banner: true author: "Alex Emmons, Ph.D." date: last-modified date-format: "MM/DD/YYYY" format: html: df-print: paged embed-resources: true code-fold: true code-summary: "Show code" code-tools: true code-copy: false code-overflow: wrap page-layout: full theme: light: [flatly, resources/custom.scss] dark: darkly toc: true toc-location: body bibliography: - resources/references.bibtex - resources/grateful-refs.bib csl: resources/apa.csl params: data: "./deseq2_DEGs.csv" --- ## Introduction Here we will create a volcano plot from differential expression results. A volcano plot is a type of scatter plot commonly used in RNA-Seq analysis to examine genes that may demonstrate biological significance. Log-fold change in expression is plotted on the x-axis and statistical significance is plotted on the y-axis. Learn more about Volcano plots in this Galaxy Training Network tutorial [@doyle2018volcano]. As discussed by @doyle2018volcano, volcano plots help identify... ::: {.callout-tip} Labels are ensembl IDs. For a more useful figure, add an annotation step. ::: ## {{< fa solid volcano >}} Create a Volcano Plot from DESeq2 differential expression results ### Load the libraries ```{r} #| label: packages #| message: false #| warning: false library(EnhancedVolcano) library(dplyr) library(gt) library(grateful) ``` ### Load the data from command line arguments The data were filtered to remove adjusted p-values that were NA; these were genes excluded by `DESeq2` as a part of independent filtering. ```{r} #| label: load-data data<-read.csv(params$data,row.names=1) %>% filter(!is.na(padj)) data ``` ### Summary Table @tbl-top-degs presents the top 10 differentially expressed genes identified in our analysis. These genes show the strongest evidence of differential expression between treated and untreated samples, as we will see in @fig-volcano-plot. ```{r} #| label: tbl-top-degs #| tbl-cap: "Top 10 differentially expressed genes by adjusted p-value" data %>% tibble::rownames_to_column("Gene_ids") %>% arrange(padj,desc(abs(log2FoldChange))) %>% head(10) %>% select(Gene_ids, log2FoldChange, padj) %>% gt() %>% fmt_scientific( columns = padj, exp_style = "E" ) %>% fmt_number( columns = log2FoldChange, decimals = 2 ) ``` ### Plot Create label subsets for plotting. ```{r} #| label: create-labels labs<-head(row.names(data),5) ``` The R package EnhancedVolcano was used visually identify which genes are statistically significant with large fold changes. The results are seen in @fig-volcano-plot. ```{r} #| label: fig-volcano-plot #| fig-cap: "Enhanced Volcano Plot of bulk RNA-seq data from the package airway" #| warning: false #| fig-alt: "Scatter plot showing the negative log10 p-values on the y-axis and log 2 fold change on the x-axis, with points colored by significance and labeled with gene names." #| fig-width: 6 #| fig-height: 6 EnhancedVolcano(data, title = "Enhanced Volcano with Airways", lab = rownames(data), selectLab=labs, labSize=3, drawConnectors = TRUE, x = 'log2FoldChange', y = 'padj') ``` ### Plot with a dynamic caption ::: {#fig-volcano-inline} ```{r} #| warning: false #| fig-width: 6 #| fig-height: 6 EnhancedVolcano(data, title = "Enhanced Volcano with Airways", lab = rownames(data), selectLab=labs, labSize=3, drawConnectors = TRUE, x = 'log2FoldChange', y = 'padj') ``` Enhanced Volcano Plot of bulk RNA-seq data from the package airway. The top 5 significant DEGs (`{r} labs`) are labeled. ::: @fig-volcano-inline includes a dynamic caption. ## Packages This report relied on the following R packages: ```{r} pkgs <- cite_packages(output = "table", out.dir = "./resources/") knitr::kable(pkgs) ``` This report was generated with Quarto, version {{< version >}}. ## References ::: {#refs} :::