R/remove_batch_effect.R
remove_batch_effect.Rd
Data x
is assumed to be log-like, and this function provides a simplified
interface to limma::removeBatchEffect()
. The batch
parameter replaces
batch
, batch2
, and covariates
. The design
parameter is replaced with
main
. This function is mostly for use within the
fetch_assay_data(..., normalized = TRUE, batch = 'something')
pipeline,
but refactored out here for general re-use.
remove_batch_effect( x, sample_info, batch = NULL, main = NULL, maintain.rowmeans = FALSE, ... )
x | A matrix of values that needs to be corrected |
---|---|
sample_info | a data.frame of covariate information for the data in |
batch | The column names in |
main | The name of a covaraite in |
a corrected version of the data matrix x
.
The batch
and main
parameters must be characters that will either
reference already existing columns in the sample_info
, or be covariates
that can be retrieved from a FacileDataStore that is attached to the
sample_info facile_frame.
We'll use these parameters to build a model.matrix with main and batch
effect and follow the use of removeBatchEffect
as outlined in the post
linked to below to pull the design matrix apart and call the function with
the corresponding design
and covariates
parameters:
https://support.bioconductor.org/p/83286/#83287
Setting the batch.scale
parameter to TRUE
(the default), ensures that
the rowMeans
of the returned data matrix are the same as the original
dataset.
It can be that some of the levels of the batch
and main
covariates
are missing NA
. When these covariates are categorical, all missing values
will be replaced with a dummy value using the logic from freplace_na()
If numeric covariates are missing, then this will throw an error.
fetch_assay_data()
when batch = "something"
# We'll materialize a data matrix and sample_info table from the # exampleFacileDataSet, then correct the data matrix. efds <- exampleFacileDataSet() sample.info <- efds %>% filter_samples(indication == "CRC") %>% with_sample_covariates() m <- fetch_assay_data(sample.info, normalized = TRUE, as.matrix = TRUE) m.rmsex <- remove_batch_effect(m, sample.info, "sex") # this functionality is called internally from fetch_assay_data to make # your life easy from within the facile ecosystem itself m2 <- fetch_assay_data(sample.info, normalized = TRUE, batch = "sex", as.matrix = TRUE) all.equal(m.rmsex, m2)#> [1] TRUE