Immerse bioconductor assay containers into the facile.bio ecosystem. — FacileBiocDataStore-class • FacileBiocData

Bioconductor assay containers, like a DGEList, DESeqDataSet, SummarizedExperiment, etc. can be used within the facie.bio ecosystem by invoking the facilitate() method on them. This will return a Facile* subclass of the container itself.

# S3 method for DESeqDataSet
facilitate(
  x,
  assay_type = "rnaseq",
  feature_type = "infer",
  organism = "unknown",
  ...,
  run_vst = NULL,
  blind = TRUE,
  nsub = 1000,
  fitType = "parametric",
  verbose = FALSE
)

# S3 method for DGEList
facilitate(
  x,
  assay_type = "rnaseq",
  feature_type = "infer",
  organism = "unknown",
  ...
)

Arguments

assay_type	A string that indicates the type of assay stored in the primary assay of the container. For some assay containers, like `DESeqDataSet`, `DGEList`, and `SingleCellExperiment`, we can assume the default value for this to be `"rnaseq"`. For the rest, we assume it's `"lognorm"`.
feature_type	A string that indicates the type of features identifiers the assay containers is using. Default is `"infer"` to try to guess, but this is not the most accurate.
organism	the organism the dataset is for (Homo sapiens, Mus musculus, etc.)
run_vst	should we re-run the vst transformation for a DESeqDataSet. If the `DESeqDataSet` already has a `"vst"` assay, then we'll just take that, otherwise if this isn't set to `FALSE` it will be run.
blind, nsub, fitType	parameters to send to `DESeq2::vst()` to tweak how it is run internally
verbose	make some noise

Details

For instance, facilitate(DGEList) will return a FacileDGEList, which can be used a "normal" DGEList in all the same ways, but is also wrapped with the facile api api and can be used by methods withing the FacileAnalysis, for instance.

These classes are also all subclass of the abstract FacileBiocDataStore virtual class.

DESeqDataSet

When a DESeqDataSet object is facilitate()'d, a normalized count matrix will be calculated using DESeq2::counts(x, normalized = TRUE) and stored as a matrix named "normcounts" in its assays() list. These are the values that are returned by (fetch|with)_assay_data when normalized = TRUE, which differs from the edgeR:cpm normalized count data which is usually returned from most every other expression container.

By default, these normalized counts will be log2 transformed when returned to conform to the expectation in the facilebio ecosystem. To get the deault DESeq2 behaviour, the user would use fetch_assay_data(.., normalized = TRUE, log = FALSE).

This function will also look for variance stabilized versions of the data in the "vst" and "rlog" assay matrices. If no "vst" assay is present, it will be run and stored there, unless the facilitate,run_vst parameter is set to FALSE. This data can be returned using assay_name = "vst"

DGEList

We assume the DGEList holds "rnaseq" assay data. Set the assay_type parameter if that's not the case.

Examples

# DESeq2 --------------------------------------------------------------------
dds <- DESeq2::makeExampleDESeqDataSet(n=2000, m=20)
fd <- facilitate(dds)
#> Warning: 2000 identifiers were either ambiguous or unknown
fetch_assay_data(samples(fd), c("gene1", "gene20"))
#> # A tibble: 40 x 8
#>    dataset sample_id assay assay_type feature_type feature_id feature_name value
#>    <chr>   <chr>     <chr> <chr>      <chr>        <chr>      <chr>        <int>
#>  1 dataset sample1   coun… rnaseq     unknown      gene1      gene1            1
#>  2 dataset sample1   coun… rnaseq     unknown      gene20     gene20           1
#>  3 dataset sample2   coun… rnaseq     unknown      gene1      gene1            1
#>  4 dataset sample2   coun… rnaseq     unknown      gene20     gene20           3
#>  5 dataset sample3   coun… rnaseq     unknown      gene1      gene1            4
#>  6 dataset sample3   coun… rnaseq     unknown      gene20     gene20           0
#>  7 dataset sample4   coun… rnaseq     unknown      gene1      gene1            1
#>  8 dataset sample4   coun… rnaseq     unknown      gene20     gene20           4
#>  9 dataset sample5   coun… rnaseq     unknown      gene1      gene1            1
#> 10 dataset sample5   coun… rnaseq     unknown      gene20     gene20           2
#> # … with 30 more rows
fetch_assay_data(samples(fd), c("gene1", "gene20"), normalized = TRUE)
#> # A tibble: 40 x 8
#>    dataset sample_id assay      assay_type feature_type feature_id feature_name
#>    <chr>   <chr>     <chr>      <chr>      <chr>        <chr>      <chr>       
#>  1 dataset sample1   normcounts tpm        unknown      gene1      gene1       
#>  2 dataset sample1   normcounts tpm        unknown      gene20     gene20      
#>  3 dataset sample2   normcounts tpm        unknown      gene1      gene1       
#>  4 dataset sample2   normcounts tpm        unknown      gene20     gene20      
#>  5 dataset sample3   normcounts tpm        unknown      gene1      gene1       
#>  6 dataset sample3   normcounts tpm        unknown      gene20     gene20      
#>  7 dataset sample4   normcounts tpm        unknown      gene1      gene1       
#>  8 dataset sample4   normcounts tpm        unknown      gene20     gene20      
#>  9 dataset sample5   normcounts tpm        unknown      gene1      gene1       
#> 10 dataset sample5   normcounts tpm        unknown      gene20     gene20      
#> # … with 30 more rows, and 1 more variable: value <dbl>

samples(fd) %>%
  with_assay_data(c("gene1", "gene20"), normalized = TRUE)
#> # A tibble: 20 x 4
#>    dataset sample_id   gene1  gene20
#>    <chr>   <chr>       <dbl>   <dbl>
#>  1 dataset sample1    0.0707  0.0707
#>  2 dataset sample2    0.0775  1.57  
#>  3 dataset sample3    2.00   -3.32  
#>  4 dataset sample4    0.0589  1.95  
#>  5 dataset sample5    0.120   1.05  
#>  6 dataset sample6   -3.32    2.94  
#>  7 dataset sample7    2.76    2.76  
#>  8 dataset sample8   -3.32    3.42  
#>  9 dataset sample9   -3.32   -3.32  
#> 10 dataset sample10   1.99   -3.32  
#> 11 dataset sample11  -3.32   -3.32  
#> 12 dataset sample12   0.979   2.51  
#> 13 dataset sample13   2.31   -3.32  
#> 14 dataset sample14   0.0729 -3.32  
#> 15 dataset sample15  -3.32   -3.32  
#> 16 dataset sample16   0.0380  3.08  
#> 17 dataset sample17   1.55    0.0619
#> 18 dataset sample18   0.0312  3.70  
#> 19 dataset sample19   2.78    4.13  
#> 20 dataset sample20  -3.32    0.0515

# Retrieiving different flavors of normalized expression data
dat <- samples(fd) %>%
  with_assay_data("gene1", normalized = TRUE) %>%
  with_assay_data("gene1", assay_name = "vst") %>%
  select(-(1:2))
colnames(dat) <- c("normcounts", "vst")
pairs(dat)

if (FALSE) {
dpca <- FacileAnalysis::fpca(fd, assay_name = "vst")
}
# edgeR ---------------------------------------------------------------------
y <- example_bioc_data(class = "DGEList")
yf <- facilitate(y)
FacileAnalysis::fpca(yf)
#> Warning: 21 (0.70) values in `event_PFS` covariate failed conversion to numeric
#> Warning: 21 (0.70) values in `tte_PFS` covariate failed conversion to numeric
#> Loading required namespace: testthat
#> ===========================================================
#> FacilePcaAnalysisResult
#> -----------------------------------------------------------
#> Assay used: counts
#> Number of features used: 1000
#> Number of PCs: 5
#> Variance explained:
#>   PC1: 62.47%
#>   PC2: 15.28%
#>   PC3: 7.83%
#>   PC4: 7.64%
#>   PC5: 6.79%
#> ===========================================================
#>