Fetch assay data from single assay of choice — fetch_assay

The (fetch|with)_assay_data functions are some of the main workhose functions of the facile ecosystem. These calls enable you to retrieve raw and noramlized assay data from a FacileData container.

fetch_assay_data(
  x,
  features,
  samples = NULL,
  assay_name = ndefault_assay(x),
  normalized = FALSE,
  batch = NULL,
  main = NULL,
  as.matrix = FALSE,
  ...,
  subset.threshold = 700,
  aggregate = FALSE,
  aggregate.by = "ewm",
  verbose = FALSE
)

# S3 method for facile_frame
with_assay_data(
  x,
  features,
  assay_name = NULL,
  normalized = TRUE,
  aggregate = FALSE,
  aggregate.by = "ewm",
  spread = TRUE,
  with_assay_name = FALSE,
  ...,
  verbose = FALSE,
  .fds = fds(x)
)

Arguments

x	A `FacileDataSrote` object, or `facile_frame`
features	a feature descriptor (data.frame with assay and feature_id columms)
samples	a samples descriptor
assay_name	the name of the assay to fetch data from. Defaults to the value of `default_assay()` for `x`. Must be a subset of `assay_names(x)`.
normalized	return normalize or raw data values, defaults to `FALSE`. This is only really "functional" for for `assay_type = "rnaseq"` types of assays, where the normalized data is log2(CPM). These values can be tweaked with `log = (TRUE\|FALSE)` and `prior.count` parameters, which can passed down internally to (eventually) `edgeR::cpm()`.
batch	The column names in `sample_info` that specify the batch covariates in the data that will be regressed out.
main	The name of a covaraite in `sample_info` that contains a known covariate that describes the "effect" of an experiment that should not be regressed out. Please refer to the Details section for more informaiton.
as.matrix	by default, the data is returned in a long-form tbl-like result. If set to `TRUE`, the data is returned as a matrix.
...	parameters to pass to normalization methods
subset.threshold	sometimes fetching all the genes is faster than trying to subset. We have to figure out why that is, but I've previously tested random features of different lengths, and around 700 features was the elbow.
aggregate.by	do you want individual level results or geneset scores? Use 'ewm' for eigenWeightedMean, and that's all.
.fds	A `FacileDataSet` object
feature_ids	character vector of feature_ids
with_symbols	Do you want gene symbols returned, too?

Value

A tibble (lazy or not) with assay data.

a tbl-like result

Details

fetch_assay_data(x, ...) will return the data in long form. with_assay_data(x, ...) is most typically used when you already have a dataset x (a facile_frame) that you want to decorate with more assay data. The assay data asked for will be appended on to x in wide format. Because fetch is (most often) used at a lower level of granularity, normalize is by default set to FALSE, while it is set to TRUE in with_assay_data.

Removing Batch Effects

When normalized data is returned, we assume these data are log-like, and you have the option to regress out batch effects using our remove_batch_effect() wrapper to limma::removeBatchEffect().

Examples

samples <- exampleFacileDataSet() %>%
  filter_samples(indication == "BLCA", sample_type == "tumor")
features <- c(PRF1='5551', GZMA='3001', CD274='29126')
dat <- with_assay_data(samples, features, normalized = TRUE, batch = "sex")
dat <- with_assay_data(samples, features, normalized = TRUE,
                       batch = c("sex", "stage"))
dat <- with_assay_data(samples, features, normealized = TRUE,
                       batch = c("sex", "stage"), main = "sample_type")