Gene Set Enrichment Analysis

Definition

"The goal of GSEA is to determine whether members of a gene set S tend to occur toward the top (or bottom) of the list L, in which case the gene set is correlated with the phenotypic class distinction." -- Gene set enrichment analysis A knowledge-based approach for interpreting genome-wide expression profiles

The input for GSEA is the normalized gene expression matrix. In this example the hallmark dataset will be used.

Tip

A positive enrichment score indicates that the gene set is enriched in the condition entered as the numerator during setup. A negative enrichment score on the other hand, indicates that the gene set is enriched in the condition entered as the denominator during setup.

Click on the GSEA data node to view results table after this task is completed. In this table, users can invoke enrichment view and summaries for each gene set as well as filter results.

The enrichment plot for the hedgehog signaling gene set is shown below and it indicates that this is enriched in the tumor samples (normalized enrichment score of 2.04).

The enrichment summary report reveals the genes in the hedgehog gene set that occur in the ranked expression list, with the leading edge gene being CELSR1s.

Refer to the following Partek Flow to learn about the interpretation of GSEA results.

Enrichment score. The algorithm walks down the ranked list of all the genes in the model, increasing the running sum (y axis) each time when a gene in the current gene set is encountered. Conversely, the running-sum is decreased each time a gene not in the current gene set is encountered. The magnitude of the increment depends on the correlation of the gene with the experimental factor. The enrichment score is then the maximum deviation from zero encountered in the random walk (the summit of the curve).
Gene set hits. Each column shows the location of a gene from the current gene set, within the ranked list of all the genes in the model.
Rank metric. The plot shows the value of the ranking metric (y axis) as you move down the ranked list of all the genes in the model (x axis). The ranking metric measures a gene’s correlation with a phenotype. A positive value of the metric indicates correlation with the first category (Numerator) and a negative value indicates correlation with the second category (Denominator).