R/api.R
, R/bioc-DESeqDataSet.R
, R/bioc-DGEList.R
FacileBiocDataStore-class.Rd
Bioconductor assay containers, like a DGEList, DESeqDataSet,
SummarizedExperiment, etc. can be used within the facie.bio ecosystem by
invoking the facilitate()
method on them. This will return a Facile*
subclass of the container itself.
# S3 method for DESeqDataSet facilitate( x, assay_type = "rnaseq", feature_type = "infer", organism = "unknown", ..., run_vst = NULL, blind = TRUE, nsub = 1000, fitType = "parametric", verbose = FALSE ) # S3 method for DGEList facilitate( x, assay_type = "rnaseq", feature_type = "infer", organism = "unknown", ... )
assay_type | A string that indicates the type of assay stored in the
primary assay of the container. For some assay containers, like
|
---|---|
feature_type | A string that indicates the type of features identifiers
the assay containers is using. Default is |
organism | the organism the dataset is for (Homo sapiens, Mus musculus, etc.) |
run_vst | should we re-run the vst transformation for a DESeqDataSet.
If the |
blind, nsub, fitType | parameters to send to |
verbose | make some noise |
For instance, facilitate(DGEList)
will return a FacileDGEList
, which can
be used a "normal" DGEList in all the same ways, but is also wrapped with
the facile api api and can be used by methods withing the FacileAnalysis
,
for instance.
These classes are also all subclass of the abstract FacileBiocDataStore
virtual class.
When a DESeqDataSet object is facilitate()'d, a normalized count matrix
will be calculated using DESeq2::counts(x, normalized = TRUE)
and stored as
a matrix named "normcounts"
in its assays()
list. These are the values
that are returned by (fetch|with)_assay_data when normalized = TRUE
, which
differs from the edgeR:cpm normalized count data which is usually returned
from most every other expression container.
By default, these normalized counts will be log2 transformed when returned to
conform to the expectation in the facilebio ecosystem. To get the deault
DESeq2 behaviour, the user would use
fetch_assay_data(.., normalized = TRUE, log = FALSE)
.
This function will also look for variance stabilized versions of the
data in the "vst"
and "rlog"
assay matrices. If no "vst"
assay is
present, it will be run and stored there, unless the facilitate,run_vst
parameter is set to FALSE
. This data can be returned using
assay_name = "vst"
We assume the DGEList holds "rnaseq"
assay data. Set the assay_type
parameter if that's not the case.
# DESeq2 -------------------------------------------------------------------- dds <- DESeq2::makeExampleDESeqDataSet(n=2000, m=20) fd <- facilitate(dds)#> Warning: 2000 identifiers were either ambiguous or unknown#> # A tibble: 40 x 8 #> dataset sample_id assay assay_type feature_type feature_id feature_name value #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> #> 1 dataset sample1 coun… rnaseq unknown gene1 gene1 1 #> 2 dataset sample1 coun… rnaseq unknown gene20 gene20 1 #> 3 dataset sample2 coun… rnaseq unknown gene1 gene1 1 #> 4 dataset sample2 coun… rnaseq unknown gene20 gene20 3 #> 5 dataset sample3 coun… rnaseq unknown gene1 gene1 4 #> 6 dataset sample3 coun… rnaseq unknown gene20 gene20 0 #> 7 dataset sample4 coun… rnaseq unknown gene1 gene1 1 #> 8 dataset sample4 coun… rnaseq unknown gene20 gene20 4 #> 9 dataset sample5 coun… rnaseq unknown gene1 gene1 1 #> 10 dataset sample5 coun… rnaseq unknown gene20 gene20 2 #> # … with 30 more rows#> # A tibble: 40 x 8 #> dataset sample_id assay assay_type feature_type feature_id feature_name #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 dataset sample1 normcounts tpm unknown gene1 gene1 #> 2 dataset sample1 normcounts tpm unknown gene20 gene20 #> 3 dataset sample2 normcounts tpm unknown gene1 gene1 #> 4 dataset sample2 normcounts tpm unknown gene20 gene20 #> 5 dataset sample3 normcounts tpm unknown gene1 gene1 #> 6 dataset sample3 normcounts tpm unknown gene20 gene20 #> 7 dataset sample4 normcounts tpm unknown gene1 gene1 #> 8 dataset sample4 normcounts tpm unknown gene20 gene20 #> 9 dataset sample5 normcounts tpm unknown gene1 gene1 #> 10 dataset sample5 normcounts tpm unknown gene20 gene20 #> # … with 30 more rows, and 1 more variable: value <dbl>#> # A tibble: 20 x 4 #> dataset sample_id gene1 gene20 #> <chr> <chr> <dbl> <dbl> #> 1 dataset sample1 0.0707 0.0707 #> 2 dataset sample2 0.0775 1.57 #> 3 dataset sample3 2.00 -3.32 #> 4 dataset sample4 0.0589 1.95 #> 5 dataset sample5 0.120 1.05 #> 6 dataset sample6 -3.32 2.94 #> 7 dataset sample7 2.76 2.76 #> 8 dataset sample8 -3.32 3.42 #> 9 dataset sample9 -3.32 -3.32 #> 10 dataset sample10 1.99 -3.32 #> 11 dataset sample11 -3.32 -3.32 #> 12 dataset sample12 0.979 2.51 #> 13 dataset sample13 2.31 -3.32 #> 14 dataset sample14 0.0729 -3.32 #> 15 dataset sample15 -3.32 -3.32 #> 16 dataset sample16 0.0380 3.08 #> 17 dataset sample17 1.55 0.0619 #> 18 dataset sample18 0.0312 3.70 #> 19 dataset sample19 2.78 4.13 #> 20 dataset sample20 -3.32 0.0515# Retrieiving different flavors of normalized expression data dat <- samples(fd) %>% with_assay_data("gene1", normalized = TRUE) %>% with_assay_data("gene1", assay_name = "vst") %>% select(-(1:2)) colnames(dat) <- c("normcounts", "vst") pairs(dat)if (FALSE) { dpca <- FacileAnalysis::fpca(fd, assay_name = "vst") } # edgeR --------------------------------------------------------------------- y <- example_bioc_data(class = "DGEList") yf <- facilitate(y) FacileAnalysis::fpca(yf)#> Warning: 21 (0.70) values in `event_PFS` covariate failed conversion to numeric#> Warning: 21 (0.70) values in `tte_PFS` covariate failed conversion to numeric#>#> =========================================================== #> FacilePcaAnalysisResult #> ----------------------------------------------------------- #> Assay used: counts #> Number of features used: 1000 #> Number of PCs: 5 #> Variance explained: #> PC1: 62.47% #> PC2: 15.28% #> PC3: 7.83% #> PC4: 7.64% #> PC5: 6.79% #> =========================================================== #>