Performs a principal components analysis over a specified assay from the (subset of) samples in a FacileDataStore.
# S3 method for FacilePcaAnalysisResult
compare(x, y, run_all = TRUE, rerun = TRUE, ...)
fpca(
x,
assay_name = NULL,
dims = 5,
features = NULL,
filter = "variance",
ntop = 1000,
row_covariates = NULL,
col_covariates = NULL,
batch = NULL,
main = NULL,
...
)
# S3 method for facile_frame
fpca(
x,
assay_name = NULL,
dims = min(5, nrow(collect(x, n = Inf)) - 1L),
features = NULL,
filter = "variance",
ntop = 1000,
row_covariates = NULL,
col_covariates = NULL,
batch = NULL,
main = NULL,
custom_key = Sys.getenv("USER"),
...
)
# S3 method for matrix
fpca(
x,
dims = min(5, ncol(x) - 1L),
features = NULL,
filter = "default",
ntop = 1000,
row_covariates = NULL,
col_covariates = NULL,
batch = NULL,
main = NULL,
use_irlba = dims < 7,
center = TRUE,
scale. = FALSE,
...
)
a facile data container (FacileDataSet), or a facile_frame
(refer to the FacileDataStore (facile_frame) section.
when rerun = TRUE
(default), the fpca(x)
and fpca(y)
will
be rerun over the union of the features in x
and y
.
the name of the assay to extract data from to perform the
PCA. If not specified, default assays are taken for each type of assay
container (ie. default_assay(facile container)
, "counts"
for a
DGEList
, assayNames(SummarizedExperiment)[1L]
, etc.)
the number of PC's to calculate (minimum is 3).
A feature descriptor of the features to use for the analysis.
If NULL
(default), then the specified filter
strategy is used.
A strategy used to identify which features to use for the
dimensionality reduction. The current (and only choice) is "default"
,
which takes the ntop
features, sorted be decreasing variance.
the number of features (genes) to include in the PCA. Genes are
ranked by decreasing variance across the samples in x
.
data.frames that provie meta information
for the features (rows) and samples (columns). The default is to get
these values from "the obvious places" given x
($genes
and $samples
for a DGEList, or the sample and feature-level covariate database tables
from a FacileDataSet, for example).
specify the covariates to use for batch effect removal.
Refer to the FacileData::remove_batch_effect()
help for more information.
an fpca result
The FacilePcaAnalysisResult
produced here can be used in "the usual" ways,
ie. can be viz
-ualized. shine()
is 1/4th-implemented, and report()
has not been worked on yet.
Importantly / interestingly, you can shoot this result into ffsea()
to
perform gene set enrichment analysis over a specified dimension to identify
functional categories loaded onto differend PCs.
We can compare two PCA results. Currently this just means we compare the
loadings of the features along each PC from fpca result x
and y
.
Because we assume that PCA is performed on normalized data, we leverage the
batch correction facilities provided by the batch
and main
parameters
in the FacileData::fetch_assay_data()
pipeline. If your samples have a
"sex"
covariate defined, for example, you can perform a PCA with
sex-corrected expression values like so: fpca(samples, batch = "sex")
By default, fpca()
will assess the variance of all the features (genes) to
perform PCA over, and will keep the top ntop
ones. This behavior is
determined by the following three parameters:
filter
determines the method by which features are selected for
analysis. Currently you can only choose "variance"
(the default) or
"none"
.
features
determines the universe of features that are available for the
analysis. When NULL
(default), all features for the given assay will
be loaded and filtered using the specification of the filter
parameter.
If a feature descriptor is provided and filter
is not specified, then
we assume that these are the exact features to run the analysis on, and
filter
defaults to "none"
. You may, however, intend for features
to
define the universe of features to use prior to filtering, perhaps to
perform a PCA on only a certain set of genes (protein coding), but then
filter those further by variance. In this case, you will need to pass in
the feature descriptor for the universe of features you want to consider,
then explicity set filter = "variance"
.
ntop
the default "top" number of features to take when filtering by
variance.
Follow progress on implementation of shine()
and report()
below:
Note that there are methods defined for other assay containers, like an
edgeR::DGEList
, limma::EList
, and SummarizedExperiment
. If these are
called directly, their downstream use within the facile ecosystem isn't
yet fully supported. Development of the
FacileBioc package
will address this.
The code here is largely inspired by DESeq2's plotPCA.
You should look at factominer:
http://factominer.free.fr/factomethods/index.html
http://factominer.free.fr/graphs/factoshiny.html
This looks like a useful tutorial to use when explaining the utility of PCA analysis: http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/
High-Dimensional Data Analysis course by Rafa Irizarry and Michael Love https://online-learning.harvard.edu/course/data-analysis-life-sciences-4-high-dimensional-data-analysis?category[]=84&sort_by=date_added&cost[]=free
We enable the user to supply extra sample covariates that are not found
in the FacileDataStore associated with these samples x
by adding them as
extra columns to x
.
If manually provioded col_covariates have the same name as internal sample covariates, then the manually provided ones will supersede the internals.
efds <- FacileData::exampleFacileDataSet()
p1 <- efds %>%
FacileData::filter_samples(indication == "CRC") %>%
fpca()
p2 <- efds %>%
FacileData::filter_samples(indication == "BLCA") %>%
fpca()
pcmp <- compare(p1, p2)
efds <- FacileData::exampleFacileDataSet()
# A subset of samples ------------------------------------------------------
pca.crc <- efds %>%
FacileData::filter_samples(indication == "CRC") %>%
fpca()
if (interactive()) {
# report(pca.crc, color_aes = "sample_type")
shine(pca.crc)
viz(pca.crc, color_aes = "sex")
}
# Regress "sex" out from expression data
pca.crcs <- FacileData::samples(pca.crc) %>%
fpca(batch = "sex")
if (interactive()) {
viz(pca.crcs, color_aes = "sex")
}
# Perform PCA on only the protein coding genes
genes.pc <- features(efds) %>% subset(meta == "protein_coding")
pca.crc.pc <- samples(pca.crc) %>%
fpca(features = genes.pc, filter = "variance")
pca.gdb <- pca.crc %>%
signature(dims = 1:3) %>%
result() %>%
sparrow::GeneSetDb()
# All samples --------------------------------------------------------------
pca.all <- fpca(efds)
if (interactive()) {
viz(pca.all, color_aes = "indication", shape_aes = "sample_type")
# report(pca.all, color_aes = "indication", shape_aes = "sample_type")
}