Regress out confounding variables from a data matrix.

Data x is assumed to be log-like, and this function provides a simplified interface to limma::removeBatchEffect(). The batch parameter replaces batch, batch2, and covariates. The design parameter is replaced with main. This function is mostly for use within the fetch_assay_data(..., normalized = TRUE, batch = 'something') pipeline, but refactored out here for general re-use.

remove_batch_effect(
  x,
  sample_info,
  batch = NULL,
  main = NULL,
  maintain.rowmeans = FALSE,
  ...
)

Arguments

x	A matrix of values that needs to be corrected
sample_info	a data.frame of covariate information for the data in `x`. The rows of `sample_info` are assumed to match the columns of `x`. This data.frame should have the covariates named in `batch` and `main` to use for the correction. If `sample_info` is a `facile_frame`, we will endeavor to pull any covariate named in `batch` and `main` that do not already appear in the columns of `sample_info`. Unlike limma's removeBatchEffect, we do not try to fish out the covariate values from anywhere in the "ether". They must be found in this data.frame.
batch	The column names in `sample_info` that specify the batch covariates in the data that will be regressed out.
main	The name of a covaraite in `sample_info` that contains a known covariate that describes the "effect" of an experiment that should not be regressed out. Please refer to the Details section for more informaiton.

Value

a corrected version of the data matrix x.

Details

The batch and main parameters must be characters that will either reference already existing columns in the sample_info, or be covariates that can be retrieved from a FacileDataStore that is attached to the sample_info facile_frame.

We'll use these parameters to build a model.matrix with main and batch effect and follow the use of removeBatchEffect as outlined in the post linked to below to pull the design matrix apart and call the function with the corresponding design and covariates parameters:

https://support.bioconductor.org/p/83286/#83287

Setting the batch.scale parameter to TRUE (the default), ensures that the rowMeans of the returned data matrix are the same as the original dataset.

Missing values in batch covariates

It can be that some of the levels of the batch and main covariates are missing NA. When these covariates are categorical, all missing values will be replaced with a dummy value using the logic from freplace_na()

If numeric covariates are missing, then this will throw an error.

Examples

# We'll materialize a data matrix and sample_info table from the
# exampleFacileDataSet, then correct the data matrix.
efds <- exampleFacileDataSet()
sample.info <- efds %>%
  filter_samples(indication == "CRC") %>%
  with_sample_covariates()
m <- fetch_assay_data(sample.info, normalized = TRUE, as.matrix = TRUE)
m.rmsex <- remove_batch_effect(m, sample.info, "sex")

# this functionality is called internally from fetch_assay_data to make
# your life easy from within the facile ecosystem itself
m2 <- fetch_assay_data(sample.info, normalized = TRUE,
                       batch = "sex", as.matrix = TRUE)
all.equal(m.rmsex, m2)
#> [1] TRUE

Arguments

Value

Details

Missing values in batch covariates

See also

Examples