Gene Set Enrichment Analysis (GSEA) with R

Lesson Objectives

Introduce GSEA
Discuss options for GSEA in R
Demo GSEA in R

What is GSEA?

Gene Set Enrichment Analysis (GSEA) is a popular and heavily cited method used for functional enrichment / pathway analysis that "determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes)."

By looking at the expression patterns across all genes in a data set, GSEA allows researchers to identify biological pathways or processes that are significantly associated with a particular phenotype or condition, even when individual genes may not show significant changes in expression. Unlike over-representation analysis, it is not sensitive to arbitrary user-defined choices like gene selection.

Like other functional enrichment tools, GSEA can help prioritize candidate genes, proteins, or metabolites for further investigation and provide insights into the underlying biology of complex diseases.

Compare with DAVID

In a past Coding Club, we looked at ORA with DAVID. You can check out that lesson here: (An Introduction to DAVID for Functional Enrichment Analysis).

Steps in a GSEA Analysis

1) Create a ranked list of genes. More on ranking metrics below.
2) Select gene sets to test. There are many available databases with pre-defined gene sets. You should choose gene sets relevant to your experimental conditions / hypotheses.

Gene set does NOT equal pathway

Gene sets are collections of genes formed on the basis of shared biological or functional properties including molecular interactions, regulation, molecular product(s), and even phenotype associations. A gene set can represent a pathway, but not all gene sets are pathways.

3) Calculate the enrichment score (ES). The ES reflects the degree to which the genes in the set are over-represented at the top or bottom of the ranked list.
4) Calculate significance. A modified Komologorov-Smirnov (KS) test is used to determine whether the distribution of genes in a gene set within the ranked list is non-random. In a pre-ranked test, a null distribution of ES is generated by permuting the gene labels, and the observed ES is compared to this null distribution to determine statistical significance. Statistical procedures for p-value generation vary.
5) Apply multiple testing correction. Correct nominal p-value for multiple testing of many gene sets often using false discovery rate (FDR) or Benjamin-Hochberg.
6) Visualize results.

An enrichment score is calculated by walking down the list of features, increasing a running-sum statistic when a feature in the target feature set is encountered and decreasing it when it is not. The final score is the maximum deviation from zero encountered in the random walk. A normalized score can be obtained by computing the z-score of the estimate compared to a null distribution obtained from N random permutations. The normalized score accounts for differences in gene set size and can be used to compare the enrichment of different gene sets. The genes in the leading edge (see red box) are the genes driving the enrichment (Figure 1).

Things to note

1) The original implementation of GSEA is the BROAD's GSEA Software. The original publication is here. This software has implementations for self-contained tests (using sample permutations) and competitive tests (using gene label permutations). The original self-contained method is more robust but requires larger sample sizes. There is a modified version of this in R, but it is not well documented (See fgsea::fgseaLabel()).

Self-contained vs Competitive

Self-contained hypothesis:

Null hypothesis: "no genes in a given gene set are differentially expressed.
Permutes phenotype labels
"Focuses on the genes in a given gene set and ignores genes outside the set, providing strong statistical power and rejecting more null hypotheses".
Requires at least 7 samples per condition

Competitive hypothesis: (pre-ranked GSEA by default)

Null hypothesis: "genes in a given gene set are at most as often differentially expressed as the genes not in the set."
Permutes gene labels.
"Compares genes within a set to genes outside the set."
Ignores gene-gene correlations leading to more false positives.
Details from Powers et al. 2018

2) Here, we are using a pre-ranked implementation of GSEA known as FGSEA, which is also a competitive method. This method is fast, scalable, and more appropriate for data sets with smaller numbers of replicates (due to gene sampling). Read more about FGSEA and differences from the original pre-ranked GSEA here. In Particular, in FGSEA the p-value estimation is based on an adaptive multi-level split Monte-Carlo scheme, while the multiple comparison correction uses standard BH-adjusted p-values, which makes this method less conservative than the Broad GSEA.

Tools to perform GSEA

There are several tools and packages available for performing GSEA, including but not limited to the following:

GSEA software: The original GSEA software developed by the Broad Institute, which provides a user-friendly interface for performing GSEA and visualizing results.
clusterProfiler: An R package that provides functions for GSEA and visualization of results. It supports various gene set databases (e.g., GO, Reactome, KEGG, etc.) and allows for easy integration with other R packages.
fgsea: An R package that implements a fast version of GSEA, allowing for large-scale analysis of gene sets. Used by clusterProfiler.
WebGestalt: A web-based tool that provides a comprehensive platform for gene set analysis, including GSEA, pathway analysis, and functional enrichment analysis.
Qlucore: A web-based tool that provides a user-friendly interface for performing GSEA and visualizing results. It supports various gene set databases and allows for easy integration with other tools. CCR licenses available.
PartekFlow: A commercial software package that provides a comprehensive platform for gene expression analysis, including GSEA and pathway analysis. It offers a user-friendly interface and supports various gene set databases. NCI licenses available.

Why Use GSEA in R?

The Broad version of GSEA minimally requires normalized expression data from RNA-Seq Experiments. For example, you can use:
- normalised RNA-seq counts via DESeq2's 'geometric' normalization, EdgeR's TMM method, etc.
- normalised + transformed RNA-seq expression levels, such as variance-stabilized (vst) or regularized log (rlog) expression levels from DESeq2, or log2 CPMs from EdgeR --- Biostars.
Many of the packages used for normalization and differential expression analysis are already available in R, making it a convenient environment for performing GSEA.
R facilitates programmatic implementation of GSEA via a pipeline, and FGSEA enables fast and scalable analysis.
With R, you can more easily visualize and customize GSEA results.

`clusterProfiler`

While the fast pre-ranked GSEA (FGSEA) is available through Bioconductor in the package fgsea, it is also implemented in clusterProfiler and other packages.

clusterProfiler is an R package used for functional enrichment analysis (either ORA or GSEA). See the latest publication here. The primary documentation for clusterProfiler can be found here.

Being a Bioconductor package, clusterProfiler is a good choice because

It has ORA and GSEA support,
Functionality for multiple ontology and pathway databases, and
It allows user defined databases and annotations, which is particularly useful for non-model organisms.

If you are interested in the general capabilities of clusterProfiler, see this 2023 tutorial from the BTEP Coding Club.

Load the Packages

library(clusterProfiler) #Bioconductor
library(tidyverse) #CRAN
library(org.Hs.eg.db) #Bioconductor
library(msigdbr) #CRAN

The Data

GSEA analysis requires two inputs, a ranked gene list and gene set collections.

For this example, we will create a ranked gene list using differential expression results from DESeq2 from a tumor-normal experiment. To obtain the data download here.

Load the differential expression results:

degs <- read_csv("deg_NormVTumor.csv") |> 
  relocate(GeneName, .after=EnsemblID)

Rows: 24456 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): EnsemblID, GeneName
dbl (7): baseMean, log2FoldChange, lfcSE, stat, pvalue, padj, posteriorLFC

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

degs

# A tibble: 24,456 × 9
   EnsemblID       GeneName baseMean log2FoldChange  lfcSE  stat pvalue  padj
   <chr>           <chr>       <dbl>          <dbl>  <dbl> <dbl>  <dbl> <dbl>
 1 ENSG00000175899 A2M        19058.           7.45 0.0708 105.       0     0
 2 ENSG00000205002 AARD       11578.          10.2  0.232   44.0      0     0
 3 ENSG00000090861 AARS1      37863.           2.30 0.0150 153.       0     0
 4 ENSG00000165029 ABCA1       2354.           4.14 0.0786  52.7      0     0
 5 ENSG00000144452 ABCA12     19414.          -9.79 0.154  -63.7      0     0
 6 ENSG00000107331 ABCA2       5039.           3.18 0.0443  71.8      0     0
 7 ENSG00000103222 ABCC1      36135.           2.08 0.0140 148.       0     0
 8 ENSG00000108846 ABCC3      16944.          10.5  0.209   50.1      0     0
 9 ENSG00000125257 ABCC4       8719.          -2.17 0.0274 -79.3      0     0
10 ENSG00000143322 ABL2       14963.           2.30 0.0214 108.       0     0
# ℹ 24,446 more rows
# ℹ 1 more variable: posteriorLFC <dbl>

Ranking

We are using a pre-ranked version of GSEA (FGSEA). Therefore, the first step is to rank our genes. This can be done in many different ways, and the ranking metric chosen can bias results. You should choose an appropriate ranking metric based on your experimental design and how the data was processed.

Choosing a ranking metric

Commonly used ranking metrics include:

Fold Change (preferably shrunken values if DESeq2)
- Genes are ranked by the magnitude of change between conditions, but inflated fold changes are often observed for genes with small counts or large standard errors. "To generate more accurate log2 foldchange estimates, DESeq2 allows for the shrinkage of the LFC estimates toward zero when the information for a gene is low." Shrunken values are often suggested for ranking purposes.
Combination of p-value and fold change (e.g., rank by p-value, then by log-fold change)
Signed -log10(p-value) (e.g., sign(df$log2fc)*(-log10(df$pval))) - combines significance with the direction of change. However, 0 p-values will result in Inf/-Inf and require correction (See here).
Test Statistic (e.g., t-test statistic, Wald test statistic from DESeq2) - This includes information about the effect size and variability. It is conceptually similar to using the p-value.
Signal-to_Noise Ratio - (mean(expression in group 1) - mean(expression in group 2)) / (sd(group 1) + sd(group 2)) - Used by the Broad Institute GSEA software but requires a minimum of 7 samples per group to be valid.

Create the ranked list

We will use the shrunken log fold change values for this example, which are denoted as "posteriorLFC" in degs. We need to create a named vector to hold our ranked data.

ranked<-degs %>% 
  filter(!is.na(padj)) %>% #filter any NA values
  arrange(desc(posteriorLFC)) %>% #arrange the data by ranking metric
  pull(posteriorLFC,name=EnsemblID) #create named vector

head(ranked)

ENSG00000169247 ENSG00000130881 ENSG00000087116 ENSG00000128045 ENSG00000187627 
       14.86425        14.20553        13.96218        13.74109        13.67227 
ENSG00000182013 
       13.16344

Gene IDs

There are many options for gene IDs. Many tools prefer Entrez IDs or official gene symbols. However, you can convert between IDs fairly easily with clusterProfiler. Keep in mind that gene annotations may differ between genome builds, so take care when converting IDs or using different databases. Here, I am sticking with Ensembl IDs to avoid issues with duplicated gene symbols. Regardless of the ID that you choose, it is important that the gene IDs in your ranked list match the gene IDs in the gene sets tested.

Running GSEA

GSEA can be performed using GO (gseGO), KEGG pathways (gseKEGG), KEGG modules (gseMKEGG), and WikiPathways (gseWP). With the DOSE package, you can use DisGeNET (DOSE::gseDGN), Disease Ontology (DOSE::gseDO), and Network of Cancer Genes and Healthy Drivers (DOSE::gseNCG), and with ReactomePA (ReactomePA::gsePathway). Additionally, there is a function (GSEA()) that can be used in the event that a user is working with "unsupported organisms, slim versions of GO, novel functional annotation (e.g. GO via BlastGO or KEGG via KAAS), unsupported ontologies/pathways or customized annotations."

Obtain the gene sets

What is MSigDB and how does it relate to GSEA?

The Molecular Signatures Database (MSigDB) is a curated resource of thousands of gene sets by the Broad Institute. These sets were curated for use with GSEA software but are used with other tools as well.

MSigDB includes both human and mouse collections, and there are 35,134 gene sets in the most recent release of the Human Molecular Signatures Database (MSigDB v2025.1.Hs updated June 2025).

There are 9 larger themed collections:

Highlighted examples:

C5 - the gene ontology (GO) collection. C2 - curated gene sets from publications and pathway databases (e.g., KEGG and REACTOME).
Hallmark collection - a summary list of well-defined gene sets with decreased redundancy. A good choice for many studies.
C7 - great for immunological research.

There is not a GSEA specific function from clusterProfiler incorporating the MSigDB gene sets, but we can use GSEA() with gene sets obtained from MSigDB. Signatures can be obtained by:

Downloading directly from MSigDB and using read.gmt() from clusterProfiler to parse the gmt files. Only official gene symbols and Entrez IDs are available in the gmt files. However, you can also access the chip files for Ensembl IDs and other annotations.
Using library(msigdbr).

The msigdbr R package provides Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software:

in an R-friendly “tidy” format with one gene pair per row.

for multiple frequently studied model organisms, such as mouse, rat, pig, zebrafish, fly, and yeast, in addition to the original human genes

as gene symbols as well as NCBI Entrez and Ensembl IDs without accessing external resources requiring an active internet connection ---igordot.github.io/msigdbr/.

We can view the supported species and available collections using

msigdbr_species() # to see the supported species

# A tibble: 20 × 2
   species_name                    species_common_name                          
   <chr>                           <chr>                                        
 1 Anolis carolinensis             Carolina anole, green anole                  
 2 Bos taurus                      bovine, cattle, cow, dairy cow, domestic cat…
 3 Caenorhabditis elegans          <NA>                                         
 4 Canis lupus familiaris          dog, dogs                                    
 5 Danio rerio                     leopard danio, zebra danio, zebra fish, zebr…
 6 Drosophila melanogaster         fruit fly                                    
 7 Equus caballus                  domestic horse, equine, horse                
 8 Felis catus                     cat, cats, domestic cat                      
 9 Gallus gallus                   bantam, chicken, chickens, Gallus domesticus 
10 Homo sapiens                    human                                        
11 Macaca mulatta                  rhesus macaque, rhesus macaques, Rhesus monk…
12 Monodelphis domestica           gray short-tailed opossum                    
13 Mus musculus                    house mouse, mouse                           
14 Ornithorhynchus anatinus        duck-billed platypus, duckbill platypus, pla…
15 Pan troglodytes                 chimpanzee                                   
16 Rattus norvegicus               brown rat, Norway rat, rat, rats             
17 Saccharomyces cerevisiae        baker's yeast, brewer's yeast, S. cerevisiae 
18 Schizosaccharomyces pombe 972h- <NA>                                         
19 Sus scrofa                      pig, pigs, swine, wild boar                  
20 Xenopus tropicalis              tropical clawed frog, western clawed frog

msigdbr_collections() # to see available collections / subcollections

# A tibble: 25 × 4
   gs_collection gs_subcollection  gs_collection_name               num_genesets
   <chr>         <chr>             <chr>                                   <int>
 1 C1            ""                Positional                                302
 2 C2            "CGP"             Chemical and Genetic Perturbati…         3494
 3 C2            "CP"              Canonical Pathways                         19
 4 C2            "CP:BIOCARTA"     BioCarta Pathways                         292
 5 C2            "CP:KEGG_LEGACY"  KEGG Legacy Pathways                      186
 6 C2            "CP:KEGG_MEDICUS" KEGG Medicus Pathways                     658
 7 C2            "CP:PID"          PID Pathways                              196
 8 C2            "CP:REACTOME"     Reactome Pathways                        1736
 9 C2            "CP:WIKIPATHWAYS" WikiPathways                              830
10 C3            "MIR:MIRDB"       miRDB                                    2377
# ℹ 15 more rows

Note

The version of msigdbr now matches the version of msigdb queried. As of June 2025, the current version of MSigDB is v2025.1.Hs for human, so msigdbr is not using the most up-to-date version.

To retrieve all human gene sets from msigdb we use

#default is Homo sapiens
msigdbr(species = "Homo sapiens", db_species = "HS")

Choosing your gene sets

You should use your hypotheses to determine what gene sets to include in your analysis. As you increase the number of gene sets included in your analysis, the chances of false negatives will increase. The increased number of conducted tests increases the effect of multiple testing correction and reduces the multiple-test adjusted significance of individual pathways.

You can obtain specific gene sets using the collection and subcollection arguments of msigdbr. Here, we will explore the Hallmark collection.

hallmark gene sets are coherently expressed signatures derived by aggregating many MSigDB gene sets to represent well-defined biological states or processes. - MSigDB

hallmark<- msigdbr(collection = "H") %>% 
  dplyr::select(gs_name, ensembl_gene)
head(hallmark)

# A tibble: 6 × 2
  gs_name               ensembl_gene   
  <chr>                 <chr>          
1 HALLMARK_ADIPOGENESIS ENSG00000165029
2 HALLMARK_ADIPOGENESIS ENSG00000197150
3 HALLMARK_ADIPOGENESIS ENSG00000167315
4 HALLMARK_ADIPOGENESIS ENSG00000115361
5 HALLMARK_ADIPOGENESIS ENSG00000117054
6 HALLMARK_ADIPOGENESIS ENSG00000122971

h2 <- msigdbr(collection = "H") %>% 
  dplyr::select(gs_name, gs_description)

Alternatively, you could filter a data frame of all signatures using dplyr or base R syntax.

Run GSEA

Using MSigDB, we need to use the generic GSEA() from clusterProfiler.

By default, GSEA() will use FGSEA to run the analysis, which defaults to fgsea::fgsea() and in turn to fgsea::fgseaMultilevel(). Use ?GSEA() for the function documentation.

Arguments

Required arguments:
geneList - the ordered ranked gene list.
TERM2GENE - a data frame including the terms and genes (the custom gene sets, which in this case were from MSigDB).

Optional arguments:
minGSSize - the minimum size of a gene set to be analyzed (default = 10).
maxGSSize - the maximum size of a gene set to be analyzed (default = 500).
TERM2NAME - a data frame with the user input of TERM TO NAME mapping. You can include additional descriptions here.
pvalueCutoff - this controls the output in the resulting object. (default = 0.05).
seed - there is a random component to GSEA due to the gene permutations, and results will differ slightly with each run. To obtain consistent results, you can use set.seed outside of GSEA() and seed=TRUE within GSEA().

minGSSize and maxGSSize

The normalization of enrichment scores is "not very accurate for extremely small or extremely large gene sets. For example, for gene sets with fewer than 10 genes, just 2 or 3 genes can generate significant results.".

Other reasons for limiting large gene sets include:

The run time of the analysis will increase with larger gene sets.

Large gene sets are more general and therefore, more difficult to interpret.

"The GSEA null hypothesis is that the set is distributed uniformly at random along the gene ranking. For larger, structured gene sets that tends to be the case, even when there is no biological significance. In particular, artifacts from gene-gene expression correlation start to play an important role. --- Alexey Sergushichev, FGSEA author

by="DOSE

There is an alternative analysis option for GSEA (by="DOSE). This is the original implementation of GSEA. It is very slow and poorly documented. Use with caution.

#set seed
set.seed(123)
#run GSEA
eh <- GSEA(ranked, TERM2GENE = hallmark, TERM2NAME = h2, seed=TRUE)

using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...

GSEA analysis...

Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.95% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.

Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
minSize, : For some of the pathways the P-values were likely overestimated. For
such pathways log2err is set to NA.

Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
minSize, : For some pathways, in reality P-values are less than 1e-10. You can
set the `eps` argument to zero for better estimation.

leading edge analysis...

done...

head(eh)

                                                                                   ID
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION
HALLMARK_ALLOGRAFT_REJECTION                             HALLMARK_ALLOGRAFT_REJECTION
HALLMARK_UV_RESPONSE_DN                                       HALLMARK_UV_RESPONSE_DN
HALLMARK_INTERFERON_GAMMA_RESPONSE                 HALLMARK_INTERFERON_GAMMA_RESPONSE
HALLMARK_ESTROGEN_RESPONSE_EARLY                     HALLMARK_ESTROGEN_RESPONSE_EARLY
HALLMARK_IL2_STAT5_SIGNALING                             HALLMARK_IL2_STAT5_SIGNALING
                                                                                                                               Description
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION Genes defining epithelial-mesenchymal transition, as in wound healing, fibrosis and metastasis.
HALLMARK_ALLOGRAFT_REJECTION                                                               Genes up-regulated during transplant rejection.
HALLMARK_UV_RESPONSE_DN                                                    Genes down-regulated in response to ultraviolet (UV) radiation.
HALLMARK_INTERFERON_GAMMA_RESPONSE                                                   Genes up-regulated in response to IFNG [GeneID=3458].
HALLMARK_ESTROGEN_RESPONSE_EARLY                                                                Genes defining early response to estrogen.
HALLMARK_IL2_STAT5_SIGNALING                                                   Genes up-regulated by STAT5 in response to IL2 stimulation.
                                           setSize enrichmentScore       NES
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION     191       0.5970040  2.256140
HALLMARK_ALLOGRAFT_REJECTION                   174      -0.6376018 -2.140916
HALLMARK_UV_RESPONSE_DN                        143       0.5191196  1.916448
HALLMARK_INTERFERON_GAMMA_RESPONSE             191      -0.5078924 -1.715399
HALLMARK_ESTROGEN_RESPONSE_EARLY               186       0.4543630  1.711498
HALLMARK_IL2_STAT5_SIGNALING                   194      -0.4900388 -1.656697
                                                 pvalue     p.adjust
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 1.000000e-10 2.500000e-09
HALLMARK_ALLOGRAFT_REJECTION               1.000000e-10 2.500000e-09
HALLMARK_UV_RESPONSE_DN                    1.444408e-07 2.407347e-06
HALLMARK_INTERFERON_GAMMA_RESPONSE         2.022417e-06 2.112346e-05
HALLMARK_ESTROGEN_RESPONSE_EARLY           2.112346e-06 2.112346e-05
HALLMARK_IL2_STAT5_SIGNALING               1.639905e-05 1.366587e-04
                                                 qvalue rank
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 1.526316e-09 3277
HALLMARK_ALLOGRAFT_REJECTION               1.526316e-09 2156
HALLMARK_UV_RESPONSE_DN                    1.469749e-06 3044
HALLMARK_INTERFERON_GAMMA_RESPONSE         1.289643e-05 5160
HALLMARK_ESTROGEN_RESPONSE_EARLY           1.289643e-05 2832
HALLMARK_IL2_STAT5_SIGNALING               8.343375e-05 3073
                                                             leading_edge
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION tags=50%, list=13%, signal=43%
HALLMARK_ALLOGRAFT_REJECTION                tags=36%, list=9%, signal=33%
HALLMARK_UV_RESPONSE_DN                    tags=31%, list=12%, signal=28%
HALLMARK_INTERFERON_GAMMA_RESPONSE         tags=48%, list=21%, signal=38%
HALLMARK_ESTROGEN_RESPONSE_EARLY           tags=28%, list=12%, signal=25%
HALLMARK_IL2_STAT5_SIGNALING               tags=28%, list=13%, signal=25%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           core_enrichment
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION ENSG00000011465/ENSG00000182492/ENSG00000111341/ENSG00000166033/ENSG00000139329/ENSG00000142173/ENSG00000087245/ENSG00000180447/ENSG00000129038/ENSG00000109099/ENSG00000153071/ENSG00000142871/ENSG00000137801/ENSG00000166741/ENSG00000149968/ENSG00000133110/ENSG00000100234/ENSG00000135919/ENSG00000163359/ENSG00000130635/ENSG00000077942/ENSG00000019549/ENSG00000166923/ENSG00000171951/ENSG00000120708/ENSG00000118523/ENSG00000163430/ENSG00000111799/ENSG00000134871/ENSG00000176692/ENSG00000134013/ENSG00000132561/ENSG00000150630/ENSG00000122641/ENSG00000101825/ENSG00000168542/ENSG00000184347/ENSG00000112186/ENSG00000087303/ENSG00000106333/ENSG00000087494/ENSG00000065534/ENSG00000157227/ENSG00000164692/ENSG00000187498/ENSG00000006016/ENSG00000073712/ENSG00000124225/ENSG00000164932/ENSG00000115414/ENSG00000063660/ENSG00000140416/ENSG00000006327/ENSG00000172061/ENSG00000084636/ENSG00000106823/ENSG00000145147/ENSG00000152661/ENSG00000041982/ENSG00000150093/ENSG00000122786/ENSG00000171812/ENSG00000163520/ENSG00000198467/ENSG00000148848/ENSG00000170558/ENSG00000140092/ENSG00000124145/ENSG00000113083/ENSG00000163710/ENSG00000114251/ENSG00000169429/ENSG00000186340/ENSG00000163739/ENSG00000149257/ENSG00000107984/ENSG00000082781/ENSG00000073008/ENSG00000123384/ENSG00000143369/ENSG00000138685/ENSG00000112715/ENSG00000105290/ENSG00000146674/ENSG00000161638/ENSG00000128595/ENSG00000049540/ENSG00000080573/ENSG00000164176/ENSG00000134250/ENSG00000106397/ENSG00000113140/ENSG00000101335/ENSG00000164949/ENSG00000140937
HALLMARK_ALLOGRAFT_REJECTION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ENSG00000113520/ENSG00000254087/ENSG00000232810/ENSG00000015285/ENSG00000109943/ENSG00000125538/ENSG00000126759/ENSG00000204287/ENSG00000162711/ENSG00000115085/ENSG00000163519/ENSG00000169194/ENSG00000043462/ENSG00000172724/ENSG00000136634/ENSG00000140749/ENSG00000066336/ENSG00000104814/ENSG00000124882/ENSG00000100365/ENSG00000116824/ENSG00000118971/ENSG00000072694/ENSG00000147889/ENSG00000096996/ENSG00000054219/ENSG00000160791/ENSG00000019582/ENSG00000113263/ENSG00000111537/ENSG00000140030/ENSG00000060982/ENSG00000180644/ENSG00000153283/ENSG00000000938/ENSG00000100385/ENSG00000090339/ENSG00000137265/ENSG00000122862/ENSG00000113302/ENSG00000005844/ENSG00000154096/ENSG00000196735/ENSG00000185507/ENSG00000081237/ENSG00000163131/ENSG00000180353/ENSG00000182866/ENSG00000147168/ENSG00000121594/ENSG00000101017/ENSG00000105369/ENSG00000140968/ENSG00000137078/ENSG00000113532/ENSG00000204252/ENSG00000102962/ENSG00000275302/ENSG00000166501/ENSG00000114013/ENSG00000112799/ENSG00000134460/ENSG00000186810
HALLMARK_UV_RESPONSE_DN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    ENSG00000115461/ENSG00000170430/ENSG00000109099/ENSG00000153071/ENSG00000142871/ENSG00000169439/ENSG00000122420/ENSG00000019549/ENSG00000170961/ENSG00000166949/ENSG00000168542/ENSG00000112186/ENSG00000105976/ENSG00000164741/ENSG00000170425/ENSG00000147862/ENSG00000169047/ENSG00000164692/ENSG00000070778/ENSG00000125968/ENSG00000140443/ENSG00000049323/ENSG00000152661/ENSG00000117525/ENSG00000064309/ENSG00000196876/ENSG00000140092/ENSG00000058668/ENSG00000171132/ENSG00000010810/ENSG00000147852/ENSG00000182718/ENSG00000163513/ENSG00000105974/ENSG00000132170/ENSG00000180340/ENSG00000104067/ENSG00000173482/ENSG00000099250/ENSG00000005249/ENSG00000174738/ENSG00000115380/ENSG00000078053/ENSG00000134250/ENSG00000157110
HALLMARK_INTERFERON_GAMMA_RESPONSE                                                                         ENSG00000115415/ENSG00000158321/ENSG00000068079/ENSG00000104312/ENSG00000156587/ENSG00000119922/ENSG00000067057/ENSG00000169245/ENSG00000205220/ENSG00000133639/ENSG00000197329/ENSG00000184588/ENSG00000002549/ENSG00000077238/ENSG00000138378/ENSG00000131979/ENSG00000140853/ENSG00000164136/ENSG00000164305/ENSG00000100906/ENSG00000166710/ENSG00000174175/ENSG00000111335/ENSG00000135148/ENSG00000204257/ENSG00000168310/ENSG00000137959/ENSG00000111331/ENSG00000026103/ENSG00000177409/ENSG00000206503/ENSG00000184979/ENSG00000066583/ENSG00000157601/ENSG00000173193/ENSG00000124256/ENSG00000198805/ENSG00000119917/ENSG00000196116/ENSG00000073756/ENSG00000271503/ENSG00000116711/ENSG00000134470/ENSG00000111679/ENSG00000118640/ENSG00000213809/ENSG00000115267/ENSG00000162692/ENSG00000124762/ENSG00000216490/ENSG00000140280/ENSG00000115525/ENSG00000104432/ENSG00000130303/ENSG00000183347/ENSG00000137752/ENSG00000185338/ENSG00000132530/ENSG00000118503/ENSG00000127951/ENSG00000185201/ENSG00000174600/ENSG00000185215/ENSG00000121858/ENSG00000179583/ENSG00000196126/ENSG00000168899/ENSG00000128604/ENSG00000143184/ENSG00000043462/ENSG00000120217/ENSG00000162654/ENSG00000183486/ENSG00000019582/ENSG00000136514/ENSG00000100385/ENSG00000090339/ENSG00000137265/ENSG00000133106/ENSG00000196735/ENSG00000185507/ENSG00000110848/ENSG00000004468/ENSG00000101017/ENSG00000139626/ENSG00000140968/ENSG00000113532/ENSG00000100368/ENSG00000114013/ENSG00000026751/ENSG00000110324
HALLMARK_ESTROGEN_RESPONSE_EARLY                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ENSG00000130558/ENSG00000163083/ENSG00000152137/ENSG00000087510/ENSG00000074410/ENSG00000111057/ENSG00000064205/ENSG00000197977/ENSG00000171346/ENSG00000003137/ENSG00000154153/ENSG00000155792/ENSG00000164741/ENSG00000170421/ENSG00000172137/ENSG00000054598/ENSG00000138119/ENSG00000150687/ENSG00000006534/ENSG00000125850/ENSG00000075275/ENSG00000185052/ENSG00000151892/ENSG00000140443/ENSG00000182704/ENSG00000164128/ENSG00000152661/ENSG00000168453/ENSG00000173156/ENSG00000147394/ENSG00000103257/ENSG00000196372/ENSG00000146242/ENSG00000113739/ENSG00000100003/ENSG00000071242/ENSG00000198682/ENSG00000170345/ENSG00000175793/ENSG00000068615/ENSG00000088002/ENSG00000187720/ENSG00000164742/ENSG00000171345/ENSG00000123066/ENSG00000152767/ENSG00000124839/ENSG00000136826/ENSG00000103067/ENSG00000162496/ENSG00000173227/ENSG00000100321/ENSG00000003989
HALLMARK_IL2_STAT5_SIGNALING                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ENSG00000107968/ENSG00000099860/ENSG00000042493/ENSG00000069667/ENSG00000198959/ENSG00000100473/ENSG00000159399/ENSG00000117595/ENSG00000061918/ENSG00000185291/ENSG00000185338/ENSG00000127951/ENSG00000028137/ENSG00000123243/ENSG00000121858/ENSG00000025039/ENSG00000056558/ENSG00000132718/ENSG00000027075/ENSG00000120833/ENSG00000101160/ENSG00000135318/ENSG00000169194/ENSG00000049249/ENSG00000136634/ENSG00000186827/ENSG00000171791/ENSG00000186891/ENSG00000163508/ENSG00000118495/ENSG00000162654/ENSG00000147872/ENSG00000118971/ENSG00000115590/ENSG00000085563/ENSG00000169432/ENSG00000140030/ENSG00000196664/ENSG00000030419/ENSG00000100385/ENSG00000007312/ENSG00000120659/ENSG00000137265/ENSG00000188404/ENSG00000169291/ENSG00000140968/ENSG00000168421/ENSG00000120949/ENSG00000143891/ENSG00000117091/ENSG00000114013/ENSG00000163600/ENSG00000134460/ENSG00000110324/ENSG00000156127

Note

"As you can see from the warning, fgsea has a default lower bound eps=1e-10 for estimating P-values. If you need to estimate P-value more accurately, you can set the eps argument to zero in the fgsea function.".

Results

The resulting object eh is a gseaResult object. This object contains the results (eh@result) and other information that went into the analysis, for example, @organism type, @setType, the @geneSets used, the genes in our @geneList, and the @params used for the analysis. Note the use of @, which is used to access various slots within the object.

glimpse(eh)

Formal class 'gseaResult' [package "DOSE"] with 13 slots
  ..@ result     :'data.frame': 18 obs. of  11 variables:
  .. ..$ ID             : chr [1:18] "HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION" "HALLMARK_ALLOGRAFT_REJECTION" "HALLMARK_UV_RESPONSE_DN" "HALLMARK_INTERFERON_GAMMA_RESPONSE" ...
  .. ..$ Description    : chr [1:18] "Genes defining epithelial-mesenchymal transition, as in wound healing, fibrosis and metastasis." "Genes up-regulated during transplant rejection." "Genes down-regulated in response to ultraviolet (UV) radiation." "Genes up-regulated in response to IFNG [GeneID=3458]." ...
  .. ..$ setSize        : int [1:18] 191 174 143 191 186 194 178 189 169 181 ...
  .. ..$ enrichmentScore: num [1:18] 0.597 -0.638 0.519 -0.508 0.454 ...
  .. ..$ NES            : num [1:18] 2.26 -2.14 1.92 -1.72 1.71 ...
  .. ..$ pvalue         : num [1:18] 1.00e-10 1.00e-10 1.44e-07 2.02e-06 2.11e-06 ...
  .. ..$ p.adjust       : num [1:18] 2.50e-09 2.50e-09 2.41e-06 2.11e-05 2.11e-05 ...
  .. ..$ qvalue         : num [1:18] 1.53e-09 1.53e-09 1.47e-06 1.29e-05 1.29e-05 ...
  .. ..$ rank           : num [1:18] 3277 2156 3044 5160 2832 ...
  .. ..$ leading_edge   : chr [1:18] "tags=50%, list=13%, signal=43%" "tags=36%, list=9%, signal=33%" "tags=31%, list=12%, signal=28%" "tags=48%, list=21%, signal=38%" ...
  .. ..$ core_enrichment: chr [1:18] "ENSG00000011465/ENSG00000182492/ENSG00000111341/ENSG00000166033/ENSG00000139329/ENSG00000142173/ENSG00000087245"| __truncated__ "ENSG00000113520/ENSG00000254087/ENSG00000232810/ENSG00000015285/ENSG00000109943/ENSG00000125538/ENSG00000126759"| __truncated__ "ENSG00000115461/ENSG00000170430/ENSG00000109099/ENSG00000153071/ENSG00000142871/ENSG00000169439/ENSG00000122420"| __truncated__ "ENSG00000115415/ENSG00000158321/ENSG00000068079/ENSG00000104312/ENSG00000156587/ENSG00000119922/ENSG00000067057"| __truncated__ ...
  ..@ organism   : chr "UNKNOWN"
  ..@ setType    : chr "UNKNOWN"
  ..@ geneSets   :List of 50
  .. ..$ HALLMARK_ADIPOGENESIS                     : chr [1:200] "ENSG00000165029" "ENSG00000197150" "ENSG00000167315" "ENSG00000115361" ...
  .. ..$ HALLMARK_ALLOGRAFT_REJECTION              : chr [1:200] "ENSG00000090861" "ENSG00000164163" "ENSG00000136754" "ENSG00000087085" ...
  .. ..$ HALLMARK_ANDROGEN_RESPONSE                : chr [1:101] "ENSG00000125257" "ENSG00000140526" "ENSG00000123983" "ENSG00000072110" ...
  .. ..$ HALLMARK_ANGIOGENESIS                     : chr [1:36] "ENSG00000091583" "ENSG00000142192" "ENSG00000118971" "ENSG00000168542" ...
  .. ..$ HALLMARK_APICAL_JUNCTION                  : chr [1:200] "ENSG00000143632" "ENSG00000075624" "ENSG00000159251" "ENSG00000184009" ...
  .. ..$ HALLMARK_APICAL_SURFACE                   : chr [1:44] "ENSG00000137845" "ENSG00000006831" "ENSG00000169129" "ENSG00000118507" ...
  .. ..$ HALLMARK_APOPTOSIS                        : chr [1:161] "ENSG00000087274" "ENSG00000183773" "ENSG00000154122" "ENSG00000135046" ...
  .. ..$ HALLMARK_BILE_ACID_METABOLISM             : chr [1:112] "ENSG00000165029" "ENSG00000107331" "ENSG00000167972" "ENSG00000198691" ...
  .. ..$ HALLMARK_CHOLESTEROL_HOMEOSTASIS          : chr [1:74] "ENSG00000107331" "ENSG00000120437" "ENSG00000131069" "ENSG00000184009" ...
  .. ..$ HALLMARK_COAGULATION                      : chr [1:139] "ENSG00000175899" "ENSG00000168306" "ENSG00000168615" "ENSG00000214274" ...
  .. ..$ HALLMARK_COMPLEMENT                       : chr [1:201] "ENSG00000077522" "ENSG00000168615" "ENSG00000274286" "ENSG00000108599" ...
  .. ..$ HALLMARK_DNA_REPAIR                       : chr [1:150] "ENSG00000094914" "ENSG00000196839" "ENSG00000174233" "ENSG00000130706" ...
  .. ..$ HALLMARK_E2F_TARGETS                      : chr [1:200] "ENSG00000004455" "ENSG00000143401" "ENSG00000111875" "ENSG00000105011" ...
  .. ..$ HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION: chr [1:200] "ENSG00000154175" "ENSG00000107796" "ENSG00000148848" "ENSG00000166825" ...
  .. ..$ HALLMARK_ESTROGEN_RESPONSE_EARLY          : chr [1:200] "ENSG00000183044" "ENSG00000167972" "ENSG00000140526" "ENSG00000099204" ...
  .. ..$ HALLMARK_ESTROGEN_RESPONSE_LATE           : chr [1:200] "ENSG00000167972" "ENSG00000140526" "ENSG00000168306" "ENSG00000148700" ...
  .. ..$ HALLMARK_FATTY_ACID_METABOLISM            : chr [1:158] "ENSG00000109576" "ENSG00000060971" "ENSG00000167315" "ENSG00000115361" ...
  .. ..$ HALLMARK_G2M_CHECKPOINT                   : chr [1:200] "ENSG00000097007" "ENSG00000123505" "ENSG00000032219" "ENSG00000169136" ...
  .. ..$ HALLMARK_GLYCOLYSIS                       : chr [1:200] "ENSG00000115657" "ENSG00000170425" "ENSG00000162688" "ENSG00000188157" ...
  .. ..$ HALLMARK_HEDGEHOG_SIGNALING               : chr [1:36] "ENSG00000087085" "ENSG00000205336" "ENSG00000126016" "ENSG00000176749" ...
  .. ..$ HALLMARK_HEME_METABOLISM                  : chr [1:200] "ENSG00000115657" "ENSG00000118777" "ENSG00000213088" "ENSG00000102575" ...
  .. ..$ HALLMARK_HYPOXIA                          : chr [1:200] "ENSG00000144476" "ENSG00000148926" "ENSG00000170425" "ENSG00000162433" ...
  .. ..$ HALLMARK_IL2_STAT5_SIGNALING              : chr [1:199] "ENSG00000085563" "ENSG00000135074" "ENSG00000204305" "ENSG00000101444" ...
  .. ..$ HALLMARK_IL6_JAK_STAT3_SIGNALING          : chr [1:87] "ENSG00000175899" "ENSG00000135503" "ENSG00000139567" "ENSG00000030110" ...
  .. ..$ HALLMARK_INFLAMMATORY_RESPONSE            : chr [1:201] "ENSG00000165029" "ENSG00000136754" "ENSG00000135503" "ENSG00000121989" ...
  .. ..$ HALLMARK_INTERFERON_ALPHA_RESPONSE        : chr [1:97] "ENSG00000160710" "ENSG00000166710" "ENSG00000168062" "ENSG00000130303" ...
  .. ..$ HALLMARK_INTERFERON_GAMMA_RESPONSE        : chr [1:200] "ENSG00000160710" "ENSG00000221963" "ENSG00000150347" "ENSG00000122644" ...
  .. ..$ HALLMARK_KRAS_SIGNALING_DN                : chr [1:200] "ENSG00000073734" "ENSG00000172350" "ENSG00000159251" "ENSG00000184160" ...
  .. ..$ HALLMARK_KRAS_SIGNALING_UP                : chr [1:200] "ENSG00000085563" "ENSG00000159640" "ENSG00000151694" "ENSG00000151651" ...
  .. ..$ HALLMARK_MITOTIC_SPINDLE                  : chr [1:199] "ENSG00000136754" "ENSG00000097007" "ENSG00000159842" "ENSG00000130402" ...
  .. ..$ HALLMARK_MTORC1_SIGNALING                 : chr [1:200] "ENSG00000033050" "ENSG00000278540" "ENSG00000131473" "ENSG00000123983" ...
  .. ..$ HALLMARK_MYC_TARGETS_V1                   : chr [1:200] "ENSG00000164163" "ENSG00000143727" "ENSG00000106305" "ENSG00000177879" ...
  .. ..$ HALLMARK_MYC_TARGETS_V2                   : chr [1:58] "ENSG00000106305" "ENSG00000112578" "ENSG00000122565" "ENSG00000135446" ...
  .. ..$ HALLMARK_MYOGENESIS                       : chr [1:200] "ENSG00000099204" "ENSG00000087085" "ENSG00000151726" "ENSG00000143632" ...
  .. ..$ HALLMARK_NOTCH_SIGNALING                  : chr [1:32] "ENSG00000117362" "ENSG00000137486" "ENSG00000110092" "ENSG00000055130" ...
  .. ..$ HALLMARK_OXIDATIVE_PHOSPHORYLATION        : chr [1:200] "ENSG00000131269" "ENSG00000060971" "ENSG00000167315" "ENSG00000117054" ...
  .. ..$ HALLMARK_P53_PATHWAY                      : chr [1:200] "ENSG00000183044" "ENSG00000114770" "ENSG00000100439" "ENSG00000135503" ...
  .. ..$ HALLMARK_PANCREAS_BETA_CELLS              : chr [1:40] "ENSG00000006071" "ENSG00000117020" "ENSG00000100604" "ENSG00000077279" ...
  .. ..$ HALLMARK_PEROXISOME                       : chr [1:104] "ENSG00000085563" "ENSG00000005471" "ENSG00000150967" "ENSG00000114770" ...
  .. ..$ HALLMARK_PI3K_AKT_MTOR_SIGNALING          : chr [1:105] "ENSG00000278540" "ENSG00000138071" "ENSG00000115091" "ENSG00000078295" ...
  .. ..$ HALLMARK_PROTEIN_SECRETION                : chr [1:96] "ENSG00000165029" "ENSG00000137845" "ENSG00000143401" "ENSG00000166747" ...
  .. ..$ HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY  : chr [1:49] "ENSG00000103222" "ENSG00000177556" "ENSG00000121691" "ENSG00000129355" ...
  .. ..$ HALLMARK_SPERMATOGENESIS                  : chr [1:135] "ENSG00000159640" "ENSG00000111644" "ENSG00000134940" "ENSG00000148156" ...
  .. ..$ HALLMARK_TGF_BETA_SIGNALING               : chr [1:54] "ENSG00000115170" "ENSG00000134982" "ENSG00000054267" "ENSG00000137936" ...
  .. ..$ HALLMARK_TNFA_SIGNALING_VIA_NFKB          : chr [1:200] "ENSG00000165029" "ENSG00000144476" "ENSG00000109321" "ENSG00000162772" ...
  .. ..$ HALLMARK_UNFOLDED_PROTEIN_RESPONSE        : chr [1:113] "ENSG00000059573" "ENSG00000101199" "ENSG00000070669" "ENSG00000162772" ...
  .. ..$ HALLMARK_UV_RESPONSE_DN                   : chr [1:144] "ENSG00000103222" "ENSG00000121989" "ENSG00000148700" "ENSG00000117114" ...
  .. ..$ HALLMARK_UV_RESPONSE_UP                   : chr [1:158] "ENSG00000085563" "ENSG00000060971" "ENSG00000123908" "ENSG00000023330" ...
  .. ..$ HALLMARK_WNT_BETA_CATENIN_SIGNALING       : chr [1:42] "ENSG00000151694" "ENSG00000103126" "ENSG00000168646" "ENSG00000118971" ...
  .. ..$ HALLMARK_XENOBIOTIC_METABOLISM            : chr [1:200] "ENSG00000023839" "ENSG00000108846" "ENSG00000173208" "ENSG00000163686" ...
  ..@ geneList   : Named num [1:24456] 14.9 14.2 14 13.7 13.7 ...
  .. ..- attr(*, "names")= chr [1:24456] "ENSG00000169247" "ENSG00000130881" "ENSG00000087116" "ENSG00000128045" ...
  ..@ keytype    : chr "UNKNOWN"
  ..@ permScores : num[0 , 0 ] 
  ..@ params     :List of 6
  .. ..$ pvalueCutoff : num 0.05
  .. ..$ eps          : num 1e-10
  .. ..$ pAdjustMethod: chr "BH"
  .. ..$ exponent     : num 1
  .. ..$ minGSSize    : num 10
  .. ..$ maxGSSize    : num 500
  ..@ gene2Symbol: chr(0) 
  ..@ readable   : logi FALSE
  ..@ termsim    : num[0 , 0 ] 
  ..@ method     : chr(0) 
  ..@ dr         : list()

Because we used Ensembl IDs, we may be interested in viewing the output as gene symbols. We can translate the gene IDs to gene symbols using setReadable().

eh_gs <- setReadable(eh, OrgDb = org.Hs.eg.db, keyType="ENSEMBL")
eh_gs

#
# Gene Set Enrichment Analysis
#
#...@organism    UNKNOWN 
#...@setType     UNKNOWN 
#...@keytype     ENSEMBL 
#...@geneList    Named num [1:24456] 14.9 14.2 14 13.7 13.7 ...
 - attr(*, "names")= chr [1:24456] "ENSG00000169247" "ENSG00000130881" "ENSG00000087116" "ENSG00000128045" ...
#...nPerm    
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...18 enriched terms found
'data.frame':   18 obs. of  11 variables:
 $ ID             : chr  "HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION" "HALLMARK_ALLOGRAFT_REJECTION" "HALLMARK_UV_RESPONSE_DN" "HALLMARK_INTERFERON_GAMMA_RESPONSE" ...
 $ Description    : chr  "Genes defining epithelial-mesenchymal transition, as in wound healing, fibrosis and metastasis." "Genes up-regulated during transplant rejection." "Genes down-regulated in response to ultraviolet (UV) radiation." "Genes up-regulated in response to IFNG [GeneID=3458]." ...
 $ setSize        : int  191 174 143 191 186 194 178 189 169 181 ...
 $ enrichmentScore: num  0.597 -0.638 0.519 -0.508 0.454 ...
 $ NES            : num  2.26 -2.14 1.92 -1.72 1.71 ...
 $ pvalue         : num  1.00e-10 1.00e-10 1.44e-07 2.02e-06 2.11e-06 ...
 $ p.adjust       : num  2.50e-09 2.50e-09 2.41e-06 2.11e-05 2.11e-05 ...
 $ qvalue         : num  1.53e-09 1.53e-09 1.47e-06 1.29e-05 1.29e-05 ...
 $ rank           : num  3277 2156 3044 5160 2832 ...
 $ leading_edge   : chr  "tags=50%, list=13%, signal=43%" "tags=36%, list=9%, signal=33%" "tags=31%, list=12%, signal=28%" "tags=48%, list=21%, signal=38%" ...
 $ core_enrichment: chr  "DCN/BGN/MGP/HTRA1/LUM/COL6A2/MMP2/GAS1/LOXL1/PMP22/DAB2/CCN1/THBS1/NNMT/MMP3/POSTN/TIMP3/SERPINE2/COL6A3/COL5A1"| __truncated__ "IL4/LYN/TNF/WAS/CRTAM/IL1B/CFP/HLA-DRA/NLRP3/ZAP70/TRAT1/IL13/LCP2/CCL19/IL10/IGSF6/SPI1/MAP4K1/EREG/NCF4/CD2/C"| __truncated__ "IGFBP5/MGMT/PMP22/DAB2/CCN1/SDC2/PTGFR/SNAI2/HAS2/SMAD3/COL3A1/CAP2/MET/DLC1/ADORA2B/NFIB/IRS1/COL1A2/PTPN21/ID"| __truncated__ "STAT1/AUTS2/IFI35/RIPK2/UBE2L6/IFIT2/PFKP/CXCL10/PSMB10/BTG1/PELI1/PDE4B/LAP3/IL4R/STAT4/GCH1/NLRC5/IL15/CASP3/"| __truncated__ ...
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, 19(11):3292-3320

Within our @result, we have:

ID - The term ID (gene set ID from MSigDB).

Description - A description of the term ID.

setSize - size of the pathway after removing genes not present in names(GeneList).

enrichmentScore - Same as in Broad GSEA implementation. The degree to which a set is over-represented at the top or bottom of the ranked list (negative = bottom, positive = top).

NES - allows comparability across gene sets by accounting for differences in gene set size and in correlations between gene sets and the expression dataset.

pvalue - Significance calculation (p-value) using a permutation test on gene labels.

p.adjust - a BH-adjusted p-value.

qvalue - another take on FDR adjusted p-values; See ?qvalue::qvalue.

rank - The position in the ranked list at which the maximum enrichment score occurred.

leading_edge -

tags - the percentage of genes contributing to the enrichment score
list - where in the list the enrichment score is attained
signal - enrichment signal strength

core_enrichment - Genes that contribute to the leading-edge subset within the gene set. This is the subset of genes that contributes most to the enrichment result.

glimpse(eh_gs@result)

Rows: 18
Columns: 11
$ ID              <chr> "HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION", "HALLMAR…
$ Description     <chr> "Genes defining epithelial-mesenchymal transition, as …
$ setSize         <int> 191, 174, 143, 191, 186, 194, 178, 189, 169, 181, 106,…
$ enrichmentScore <dbl> 0.5970040, -0.6376018, 0.5191196, -0.5078924, 0.454363…
$ NES             <dbl> 2.256140, -2.140916, 1.916448, -1.715399, 1.711498, -1…
$ pvalue          <dbl> 1.000000e-10, 1.000000e-10, 1.444408e-07, 2.022417e-06…
$ p.adjust        <dbl> 2.500000e-09, 2.500000e-09, 2.407347e-06, 2.112346e-05…
$ qvalue          <dbl> 1.526316e-09, 1.526316e-09, 1.469749e-06, 1.289643e-05…
$ rank            <dbl> 3277, 2156, 3044, 5160, 2832, 3073, 3683, 2769, 3261, …
$ leading_edge    <chr> "tags=50%, list=13%, signal=43%", "tags=36%, list=9%, …
$ core_enrichment <chr> "DCN/BGN/MGP/HTRA1/LUM/COL6A2/MMP2/GAS1/LOXL1/PMP22/DA…

Check out more on the GSEA function used here.

GSEA with GO

As mentioned, clusterProfiler also includes GSEA functions for specific functional databases. For example, we can look at the enrichment of terms from the Gene Ontology Consortium (GO) using gseGO().

For this function, we need to provide our ranked genes (geneList), the organism (OrgDb) object, the type of identifier used for the genes (keyType), and the GO namespace (one of "BP", "CC" and "MF"). keyType and the OrgDb object allow gseGO() to convert IDs as needed. Learn more about the organism annotation packages here.

#get keytype info for org.Hs.eg.db
keytypes(org.Hs.eg.db)

 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "MAP"         
[16] "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"        
[21] "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
[26] "UNIPROT"

#run gsea
gsea_go <- gseGO(geneList = ranked,
              OrgDb = org.Hs.eg.db,
              ont = "BP",
              keyType = "ENSEMBL",
              by="fgsea")

using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...

GSEA analysis...

Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.95% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.

Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
minSize, : For some of the pathways the P-values were likely overestimated. For
such pathways log2err is set to NA.

Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
minSize, : For some pathways, in reality P-values are less than 1e-10. You can
set the `eps` argument to zero for better estimation.

leading edge analysis...

done...

gsea_go

#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens 
#...@setType     BP 
#...@keytype     ENSEMBL 
#...@geneList    Named num [1:24456] 14.9 14.2 14 13.7 13.7 ...
 - attr(*, "names")= chr [1:24456] "ENSG00000169247" "ENSG00000130881" "ENSG00000087116" "ENSG00000128045" ...
#...nPerm    
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...649 enriched terms found
'data.frame':   649 obs. of  11 variables:
 $ ID             : chr  "GO:0050853" "GO:0035107" "GO:0035108" "GO:0050851" ...
 $ Description    : chr  "B cell receptor signaling pathway" "appendage morphogenesis" "limb morphogenesis" "antigen receptor-mediated signaling pathway" ...
 $ setSize        : int  70 132 132 185 107 107 418 165 165 205 ...
 $ enrichmentScore: num  -0.794 0.64 0.64 -0.684 0.638 ...
 $ NES            : num  -2.38 2.33 2.33 -2.3 2.26 ...
 $ pvalue         : num  1.00e-10 1.00e-10 1.00e-10 1.00e-10 1.02e-10 ...
 $ p.adjust       : num  1.03e-08 1.03e-08 1.03e-08 1.03e-08 1.03e-08 ...
 $ qvalue         : num  8.52e-09 8.52e-09 8.52e-09 8.52e-09 8.52e-09 ...
 $ rank           : num  3377 3042 3042 3436 4025 ...
 $ leading_edge   : chr  "tags=67%, list=14%, signal=58%" "tags=42%, list=12%, signal=37%" "tags=42%, list=12%, signal=37%" "tags=42%, list=14%, signal=36%" ...
 $ core_enrichment: chr  "ENSG00000111679/ENSG00000171608/ENSG00000110031/ENSG00000087088/ENSG00000197943/ENSG00000137101/ENSG00000081189"| __truncated__ "ENSG00000163064/ENSG00000006377/ENSG00000120149/ENSG00000105880/ENSG00000162878/ENSG00000066468/ENSG00000151067"| __truncated__ "ENSG00000163064/ENSG00000006377/ENSG00000120149/ENSG00000105880/ENSG00000162878/ENSG00000066468/ENSG00000151067"| __truncated__ "ENSG00000182389/ENSG00000213658/ENSG00000111679/ENSG00000107485/ENSG00000171608/ENSG00000110031/ENSG00000131981"| __truncated__ ...
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, 19(11):3292-3320

Plot GSEA results

enrichplot offers several functions specific to GSEA output, especially in regard to visualizing the running score and pre-ranked list (gseaplot() and gseaplot2()). gseaplot2() allows us to visualize several enriched gene sets at once.

enrichplot::gseaplot2(eh_gs, geneSetID = c(1,3,5))

enrichplot::gseaplot2(eh_gs, geneSetID = c(2,4))

We can also use ggplot2 helpers to easily customize figures, or you can save the results to a data frame and build any custom figure.

#plot using a barplot
ggplot(eh_gs, showCategory=10, aes(NES, fct_reorder(Description, NES),
fill=p.adjust)) +
  geom_col() +
  geom_vline(xintercept=0, linetype="dashed", color="blue",size=1)+
  scale_fill_gradientn(colours=c("#b3eebe","#46bac2", "#371ea3"),
                       guide=guide_colorbar(reverse=TRUE))+
  scale_x_continuous(expand=c(0,0))+
  theme_bw() + 
  xlab("Normalized Enrichment Score") +
  ylab(NULL) +
  ggtitle("GSEA (Hallmark)")

See further options for visualization here and here.