Currently we support running a number of feature (gene) set enrichment
analyses downstream of a some other FacileAnalysisResult
(ie.,
[fdge() | fpca()] %>% ffsea()
), or over an arbitrary data.frame of
feature-level statistics.
Please refer to the examples here, as well as theh "Feature Set Analysis" section of the vignette for more information.
ffsea(x, fsets, methods = NULL, ...)
# S3 method for FacileFseaAnalysisResult
result(x, name = "object", ...)
A FacileAnalysisResult
object, or a data.frame with feature-level
statistics, minimally with a "feature_id"
column as well as one or more
numeric
columns to rank features on.
The feature sets (likely genesets) to use for testing. This
object will be passed through the sparrow::GeneSetDb()
constructor to
create a GeneSetDb
object that will be used for testing.
A character vector of GSEA methods to use on x
. Chose any
of the method
names listed by running ffsea_methods(x)
. If NULL
(default), the first method from ffsea_methods(x)
will be selected.
A FacileFseaAnalysisResult object, which includes a SparrowResult
object as it's result()
. The geneset level statistics for each of the
methods that were run are available via tidy(ffsea.res, "<method_name>")
.
:
When running ffsea
over a FacileAnalysisResult
, the types of methods
that can be run, and their configuration are preconfigured with reaonable
defaults.
When providing a generic data.frame
of feature-level statistics to run
enrichment tests over, the user has to specificy a few more parmaters.
If running a "pre-ranked" test, (method %in% c("cameraPR", "fgsea")
) the
name of a numeric column in x
must be specified to rank the features by,
and the order by which to do that using the rank_by
and rank_order
arguments, respectively.
If running an overrepresentation analysis-style test (method = "ora"
),
the user must specifcy the name of a logical column that indicates
(when TRUE
) that a feature should be included for enrichment testing. The
user can optionally specify a group_by
column, like "direction"
, that
will be used to split the selected features into groups to perform more
specific enrichmen tests. This allows you enrichment tests to be run
separately for "up"
and "down"
regulated genes separately, for example.
Lastly, the user can provide the name of another numeric column in x
with
biased_by
which can be used to account for bias in the enrichment tests,
such as gene length, GC content, etc.
Gene sets must be supplied as a sparrow::GeneSetDb()
object.
Currently, only the following GSEA methods are supported:
"cameraPR"
: Delegates to limma::cameraPR()
to perform a competitive
gene set test based on feature ranks imposed downstream of an analysis
"fgsea"
: Delegates to fgsea::fgsea()
to perform another version of
a competitive gene set test based on ranks.
"ora"
: Performs an overrepresentation analysis test. The user
must specify the name of logical
column (select_by
) from the input
which is used to indicate the features that are selected for enrichment
analysis. The user can optionally provide the name of a numeric
column
(biased_by
) and character
column (group_by
), which will adjust the
enrichment test for a covariate that may induce a bias in the DGE results,
and also run follow up enrichment tests based by differnt groups of
features (group_by
). For example, the result table might have a
"direction"
column, which specifies the direciton of differential
expression ("up"
, or "down"
). In this case, enrichment tests will be
run over all features together, and then independantly for the ones that
are "up"
, and "down"
.
The geneset level statistics can be extracted from the
FacileFseaAnalysisResult
on a per-method basis usig the tidy()
function.
For instance, if ffsea()
was called with
fres <- ffsea(..., methods = c("cameraPR", "ora")
, the "cameraPR"
results can be extracted via tidy(fres, "cameraPR")
This functionality delegates to sparrow::seas()
to do all of the work. The
sparrow::seas interface is undergoing a bit of refactoring in order to better
support a table of feature statistcs as input (for preranked and enrichment
tests), so the "methods"
supported via ffsea()
are limited to a subset
of the ones wrapped by sparrow::seas()
, as enumerated below.
We are in a bit of a schizophrenic state right now, where tidy()
is
being the de-facto way to answer "tidy" like results (instead of result()).
This is not to say that result()
can't also return something that's
"tidy", but in this case, result(ffsea.result) will return the
SparrowResult object itself, and tidy(ffsea.result)
will dispatch
to sparrow::result()
to fetch the gsea statistcs for the method
requested.
mgres <- result(ffsea.res) # returns the SparrowResult object
camera.stats <- tidy(ffsea.res, name = "cameraPR")
https://github.com/lianos/sparrow
gdb <- sparrow::exampleGeneSetDb()
efds <- FacileData::exampleFacileDataSet()
# GSEA from t-test result ---------------------------------------------------
ttest.res <- efds %>%
FacileData::filter_samples(indication == "CRC") %>%
flm_def(covariate = "sample_type", numer = "tumor", denom = "normal",
batch = "sex") %>%
fdge(method = "voom")
ttest.gsea <- ffsea(ttest.res, gdb, methods = c("cameraPR", "ora"),
biased_by = "effective_length")
if (interactive()) {
viz(ttest.gsea, type = "density", name = "HALLMARK_HEDGEHOG_SIGNALING")
viz(ttest.gsea, type = "gsea", name = "HALLMARK_HEDGEHOG_SIGNALING")
shine(ttest.igsea)
ttest.igsea <- ffseaGadget(ttest.res, gdb)
}
camera.stats <- tidy(ttest.gsea, "cameraPR")
ora.stats <- tidy(ttest.gsea, "ora")
# GSEA from ANOVA result ----------------------------------------------------
stage.anova <- efds %>%
FacileData::filter_samples(indication == "BLCA") %>%
flm_def(covariate = "stage", batch = "sex") %>%
fdge(method = "voom")
anova.gsea <- ffsea(stage.anova, gdb)
if (interactive()) {
# TODO: shine(anova.gsea) doesn't work
shine(anova.gsea)
# We can generate the same GSEA result like so
anova.gsea2 <- ffseaGadget(stage.anova, gdb = gdb)
}
# GSEA over loadings on a Principal Component -------------------------------
pca.crc <- efds %>%
FacileData::filter_samples(indication == "CRC") %>%
fpca()
#> Loading required namespace: testthat
pca1.gsea <- ffsea(pca.crc, gdb, dim = 1)
#> Warning: Fraction of gene set IDs that match rownames in the expression object are low: 4.22%
#> Warning: Deactivating 18 gene sets because conformation of GeneSetDb to the target creates gene sets smaller than 2 or greater than Inf