FacileDataSet is a reference data storage implementation that
implements the FacileData Access API. It facilitates the storage and
retrieval of large amounts of data by leveraging a SQLite database to store
sample- and feature-level metadata ("
pData" and "
fData"), and an HDF5
file to store all of the dense assay (matrix) data (gene counts, microarray
intensities, etc.) run over the samples.
FacileDataSet( path, data.fn = NULL, sqlite.fn = NULL, hdf5.fn = NULL, meta.fn = NULL, anno.dir = NULL, cache_size = 80000, db.loc = c("reference", "temporary", "memory"), ... )
The path to the FacileData repository
A custom path to the database (probably don't mess with this)
name of SQLite data file in FacileDataSet
name of HDF5 data file in FacileDataSet
name of metadata YAML data file in FacileDataSet
A directory to house custom annotations/sample covariates
A custom paramter for the SQLite database
single character, location for the data
other args to pass down, not used at the moment
A custom path to the yaml file that has covariate mapping info
FacileDataSet is materialized on disk by a well-structured directory,
which minimally includes the following items:
data.sqlite SQLite database that stores feature and sample metadata
data.h5 HDF5 file that stores a multitude of dense assay matrices that
are generated from the assays performed on the samples in the
meta.yaml file tha contains informaiton about the
To better understand the structure and contents of this file, you can
refer to the following:
a. The included
testdata/expected-meta.yaml file, which is an
exemplar file for
b. The help file provided by the
eav_metadata_create() function, which
describes in greater detail how we track a dataset's sample-level
covariates (aka, "pData" in the bioconductor world).
In the meantime, a short description of the entries found in the
meta.yaml file is provded here:
name: the name of the dataset (ie.
"Mus musculus", ec.
default_assay: the name of the assay to use by default if none is
specified in calls to
(kind of like how
"exprs" is the default assay used when working with
datasets: a section tha enumerates the datases included internally.
The datasets are further enumerated.
sample_covariates: a section that enumerates the covariatets that
are tracked over the samples inside the
FacileDataSet (ie. a mapping
pData for the samples). Reference
for more information.
custom-annotation directory, which stores custom
(aka "pData") informaiton that analysts can identify and describe during
the course of an analysis, or even add from external sources. Although
this directory is required in the directory structure of a valid
FacileDataSet() constructor can be called with
anno.dir parameter so that custom annotations are stored
fn <- system.file("extdata", "exampleFacileDataSet", package = "FacileData") fds <- FacileDataSet(fn)