Construct a bioconductor classed object from an analysis.
# S3 method for FacileLinearModelDefinition
biocbox(
x,
assay_name = NULL,
method = NULL,
features = NULL,
filter = "default",
filter_universe = NULL,
filter_require = NULL,
with_sample_weights = FALSE,
weights = NULL,
block = NULL,
prior_count = 0.1,
...
)
# S3 method for FacileDgeAnalysisResult
biocbox(x, cached = TRUE, ...)
the name of the assay to pull data for
the name of the dge method that will be used. This will dictate the post-processing of the data
A filtering policy to remove unintereesting genes.
If "default"
(which is the default), then edgeR::filterByExpr()
is
used if we are materializing a DGEList
, otherwise lowly expressed
features are removed in a similarly "naive" manner. This can,
alternatively, be a character vector that holds the names of the features
that should be kept. Default value: "default"
.
Some methods that leverage the limma pipeline,
like "voom"
, "limma"
, and "limma-trend"
can leverage sample (array)
quality weights to downweight outlier samples. In the case of
method == "voom"
, we use limma::voomWithQualityWeights()
, while the
rest use limma::arrayWeights()
. The choice of method
determines which
sample weighting function to sue. Defaults to FALSE
.
The pseudo-count to add to count data. Used primarily
when running the limma-trend
method on count (RNA-seq) data.
passed down to internal modeling and filtering functions.
a facile_frame
that enumerates the samples to fetch
data for, as well as the covariates used in downstream analysis
a DGEList or EList with assay data in the correct place, and all of
the covariates in the $samples
or $targerts
data.frame that are requied
to test the model in mdef
.
This function accepts a model defined using using flm_def()
and
creates the appropriate Bioconductor assay container to test the model
given the assay_name
and dge method
specified by the user.
This function currently supports retrieving data and whipping it into a DGEList (for count-like data) and an EList for data that can be analyzed with one form limma or another.
Assumptions on different assay_type
values include:
rnaseq
: assumed to be "vanilla" bulk rnaseq gene counts
umi
: data from bulk rnaseq, UMI data, like quantseq
tpm
: TPM values. These will be log2(TPM + prior_count)
transformed,
then differentially tested using the limma-trended pipeline
TODO: support affymrna, affymirna, etc. assay types
The "filter" parameters are described in the fdge()
function for now.
Given a FacileDgeAnalysisResult, we can re-materialize the Bioconductor assay
container used within the differential testing pipeline used from fdge()
.
Currently we have limited our analysis framework to either work over DGEList
(edgeR) or EList (limma) containers.