The (fetch|with)_assay_data functions are some of the main workhose functions of the facile ecosystem. These calls enable you to retrieve raw and noramlized assay data from a FacileData container.

fetch_assay_data(
  x,
  features,
  samples = NULL,
  assay_name = ndefault_assay(x),
  normalized = FALSE,
  batch = NULL,
  main = NULL,
  as.matrix = FALSE,
  ...,
  subset.threshold = 700,
  aggregate = FALSE,
  aggregate.by = "ewm",
  verbose = FALSE
)

# S3 method for facile_frame
with_assay_data(
  x,
  features,
  assay_name = NULL,
  normalized = TRUE,
  aggregate = FALSE,
  aggregate.by = "ewm",
  spread = TRUE,
  with_assay_name = FALSE,
  ...,
  verbose = FALSE,
  .fds = fds(x)
)

Arguments

x

A FacileDataSrote object, or facile_frame

features

a feature descriptor (data.frame with assay and feature_id columms)

samples

a samples descriptor

assay_name

the name of the assay to fetch data from. Defaults to the value of default_assay() for x. Must be a subset of assay_names(x).

normalized

return normalize or raw data values, defaults to FALSE. This is only really "functional" for for assay_type = "rnaseq" types of assays, where the normalized data is log2(CPM). These values can be tweaked with log = (TRUE|FALSE) and prior.count parameters, which can passed down internally to (eventually) edgeR::cpm().

batch

The column names in sample_info that specify the batch covariates in the data that will be regressed out.

main

The name of a covaraite in sample_info that contains a known covariate that describes the "effect" of an experiment that should not be regressed out. Please refer to the Details section for more informaiton.

as.matrix

by default, the data is returned in a long-form tbl-like result. If set to TRUE, the data is returned as a matrix.

...

parameters to pass to normalization methods

subset.threshold

sometimes fetching all the genes is faster than trying to subset. We have to figure out why that is, but I've previously tested random features of different lengths, and around 700 features was the elbow.

aggregate.by

do you want individual level results or geneset scores? Use 'ewm' for eigenWeightedMean, and that's all.

.fds

A FacileDataSet object

feature_ids

character vector of feature_ids

with_symbols

Do you want gene symbols returned, too?

Value

A tibble (lazy or not) with assay data.

a tbl-like result

Details

fetch_assay_data(x, ...) will return the data in long form. with_assay_data(x, ...) is most typically used when you already have a dataset x (a facile_frame) that you want to decorate with more assay data. The assay data asked for will be appended on to x in wide format. Because fetch is (most often) used at a lower level of granularity, normalize is by default set to FALSE, while it is set to TRUE in with_assay_data.

Removing Batch Effects

When normalized data is returned, we assume these data are log-like, and you have the option to regress out batch effects using our remove_batch_effect() wrapper to limma::removeBatchEffect().

Examples

samples <- exampleFacileDataSet() %>% filter_samples(indication == "BLCA", sample_type == "tumor") features <- c(PRF1='5551', GZMA='3001', CD274='29126') dat <- with_assay_data(samples, features, normalized = TRUE, batch = "sex") dat <- with_assay_data(samples, features, normalized = TRUE, batch = c("sex", "stage")) dat <- with_assay_data(samples, features, normealized = TRUE, batch = c("sex", "stage"), main = "sample_type")