Data x is assumed to be log-like, and this function provides a simplified interface to limma::removeBatchEffect(). The batch parameter replaces batch, batch2, and covariates. The design parameter is replaced with main. This function is mostly for use within the fetch_assay_data(..., normalized = TRUE, batch = 'something') pipeline, but refactored out here for general re-use.

remove_batch_effect(
  x,
  sample_info,
  batch = NULL,
  main = NULL,
  maintain.rowmeans = FALSE,
  ...
)

Arguments

x

A matrix of values that needs to be corrected

sample_info

a data.frame of covariate information for the data in x. The rows of sample_info are assumed to match the columns of x. This data.frame should have the covariates named in batch and main to use for the correction. If sample_info is a facile_frame, we will endeavor to pull any covariate named in batch and main that do not already appear in the columns of sample_info. Unlike limma's removeBatchEffect, we do not try to fish out the covariate values from anywhere in the "ether". They must be found in this data.frame.

batch

The column names in sample_info that specify the batch covariates in the data that will be regressed out.

main

The name of a covaraite in sample_info that contains a known covariate that describes the "effect" of an experiment that should not be regressed out. Please refer to the Details section for more informaiton.

Value

a corrected version of the data matrix x.

Details

The batch and main parameters must be characters that will either reference already existing columns in the sample_info, or be covariates that can be retrieved from a FacileDataStore that is attached to the sample_info facile_frame.

We'll use these parameters to build a model.matrix with main and batch effect and follow the use of removeBatchEffect as outlined in the post linked to below to pull the design matrix apart and call the function with the corresponding design and covariates parameters:

https://support.bioconductor.org/p/83286/#83287

Setting the batch.scale parameter to TRUE (the default), ensures that the rowMeans of the returned data matrix are the same as the original dataset.

Missing values in batch covariates

It can be that some of the levels of the batch and main covariates are missing NA. When these covariates are categorical, all missing values will be replaced with a dummy value using the logic from freplace_na()

If numeric covariates are missing, then this will throw an error.

See also

fetch_assay_data() when batch = "something"

Examples

# We'll materialize a data matrix and sample_info table from the # exampleFacileDataSet, then correct the data matrix. efds <- exampleFacileDataSet() sample.info <- efds %>% filter_samples(indication == "CRC") %>% with_sample_covariates() m <- fetch_assay_data(sample.info, normalized = TRUE, as.matrix = TRUE) m.rmsex <- remove_batch_effect(m, sample.info, "sex") # this functionality is called internally from fetch_assay_data to make # your life easy from within the facile ecosystem itself m2 <- fetch_assay_data(sample.info, normalized = TRUE, batch = "sex", as.matrix = TRUE) all.equal(m.rmsex, m2)
#> [1] TRUE