Performs Feature (Gene) Set Enrichment Analyses

Currently we support running a number of feature (gene) set enrichment analyses downstream of a some other FacileAnalysisResult (ie., [fdge() | fpca()] %>% ffsea()), or over an arbitrary data.frame of feature-level statistics.

Please refer to the examples here, as well as theh "Feature Set Analysis" section of the vignette for more information.

ffsea(x, fsets, methods = NULL, ...)

# S3 method for FacileFseaAnalysisResult
result(x, name = "object", ...)

Arguments

x: A FacileAnalysisResult object, or a data.frame with feature-level statistics, minimally with a "feature_id" column as well as one or more numeric columns to rank features on.
fsets: The feature sets (likely genesets) to use for testing. This object will be passed through the sparrow::GeneSetDb() constructor to create a GeneSetDb object that will be used for testing.
methods: A character vector of GSEA methods to use on x. Chose any of the method names listed by running ffsea_methods(x). If NULL (default), the first method from ffsea_methods(x) will be selected.

Value

A FacileFseaAnalysisResult object, which includes a SparrowResultobject as it's result(). The geneset level statistics for each of the methods that were run are available via tidy(ffsea.res, "<method_name>").

Details

: When running ffsea over a FacileAnalysisResult, the types of methods that can be run, and their configuration are preconfigured with reaonable defaults.

When providing a generic data.frame of feature-level statistics to run enrichment tests over, the user has to specificy a few more parmaters. If running a "pre-ranked" test, (method %in% c("cameraPR", "fgsea")) the name of a numeric column in x must be specified to rank the features by, and the order by which to do that using the rank_by and rank_order arguments, respectively.

If running an overrepresentation analysis-style test (method = "ora"), the user must specifcy the name of a logical column that indicates (when TRUE) that a feature should be included for enrichment testing. The user can optionally specify a group_by column, like "direction", that will be used to split the selected features into groups to perform more specific enrichmen tests. This allows you enrichment tests to be run separately for "up" and "down" regulated genes separately, for example.

Lastly, the user can provide the name of another numeric column in x with biased_by which can be used to account for bias in the enrichment tests, such as gene length, GC content, etc.

Gene sets must be supplied as a sparrow::GeneSetDb() object.

GSEA Methods

Currently, only the following GSEA methods are supported:

"cameraPR": Delegates to limma::cameraPR() to perform a competitive gene set test based on feature ranks imposed downstream of an analysis
"fgsea": Delegates to fgsea::fgsea() to perform another version of a competitive gene set test based on ranks.
"ora": Performs an overrepresentation analysis test. The user must specify the name of logical column (select_by) from the input which is used to indicate the features that are selected for enrichment analysis. The user can optionally provide the name of a numeric column (biased_by) and character column (group_by), which will adjust the enrichment test for a covariate that may induce a bias in the DGE results, and also run follow up enrichment tests based by differnt groups of features (group_by). For example, the result table might have a "direction" column, which specifies the direciton of differential expression ("up", or "down"). In this case, enrichment tests will be run over all features together, and then independantly for the ones that are "up", and "down".

GSEA Statistics

The geneset level statistics can be extracted from the FacileFseaAnalysisResult on a per-method basis usig the tidy() function. For instance, if ffsea() was called with fres <- ffsea(..., methods = c("cameraPR", "ora"), the "cameraPR" results can be extracted via tidy(fres, "cameraPR")

Development Notes

This functionality delegates to sparrow::seas() to do all of the work. The sparrow::seas interface is undergoing a bit of refactoring in order to better support a table of feature statistcs as input (for preranked and enrichment tests), so the "methods" supported via ffsea() are limited to a subset of the ones wrapped by sparrow::seas(), as enumerated below.

Accessing Results

We are in a bit of a schizophrenic state right now, where tidy() is being the de-facto way to answer "tidy" like results (instead of result()).

This is not to say that result() can't also return something that's "tidy", but in this case, result(ffsea.result) will return the SparrowResult object itself, and tidy(ffsea.result) will dispatch to sparrow::result() to fetch the gsea statistcs for the method requested.

mgres <- result(ffsea.res) # returns the SparrowResult object
camera.stats <- tidy(ffsea.res, name = "cameraPR")

Examples

gdb <- sparrow::exampleGeneSetDb()
efds <- FacileData::exampleFacileDataSet()

# GSEA from t-test result ---------------------------------------------------
ttest.res <- efds %>%
  FacileData::filter_samples(indication == "CRC") %>%
  flm_def(covariate = "sample_type", numer = "tumor", denom = "normal",
          batch = "sex") %>%
  fdge(method = "voom")

ttest.gsea <- ffsea(ttest.res, gdb, methods = c("cameraPR", "ora"),
                    biased_by = "effective_length")
if (interactive()) {
  viz(ttest.gsea, type = "density", name = "HALLMARK_HEDGEHOG_SIGNALING")
  viz(ttest.gsea, type = "gsea", name = "HALLMARK_HEDGEHOG_SIGNALING")

  shine(ttest.igsea)
  ttest.igsea <- ffseaGadget(ttest.res, gdb)
}

camera.stats <- tidy(ttest.gsea, "cameraPR")
ora.stats <- tidy(ttest.gsea, "ora")

# GSEA from ANOVA result ----------------------------------------------------
stage.anova <- efds %>%
  FacileData::filter_samples(indication == "BLCA") %>%
  flm_def(covariate = "stage", batch = "sex") %>%
  fdge(method = "voom")
anova.gsea <- ffsea(stage.anova, gdb)
if (interactive()) {
 # TODO: shine(anova.gsea) doesn't work
 shine(anova.gsea)
 # We can generate the same GSEA result like so
 anova.gsea2 <- ffseaGadget(stage.anova, gdb = gdb)
}
# GSEA over loadings on a Principal Component -------------------------------
pca.crc <- efds %>%
  FacileData::filter_samples(indication == "CRC") %>%
  fpca()
#> Loading required namespace: testthat
pca1.gsea <- ffsea(pca.crc, gdb, dim = 1)
#> Warning: Fraction of gene set IDs that match rownames in the expression object are low: 4.22% 
#> Warning: Deactivating 18 gene sets because conformation of GeneSetDb to the target creates gene sets smaller than 2 or greater than Inf