The FacileDataSet is a reference data storage implementation that implements the FacileData Access API. It facilitates the storage and retrieval of large amounts of data by leveraging a SQLite database to store sample- and feature-level metadata ("pData" and "fData"), and an HDF5 file to store all of the dense assay (matrix) data (gene counts, microarray intensities, etc.) run over the samples.

FacileDataSet(
  path,
  data.fn = NULL,
  sqlite.fn = NULL,
  hdf5.fn = NULL,
  meta.fn = NULL,
  anno.dir = NULL,
  cache_size = 80000,
  db.loc = c("reference", "temporary", "memory"),
  ...
)

Arguments

path

The path to the FacileData repository

data.fn

A custom path to the database (probably don't mess with this)

sqlite.fn

name of SQLite data file in FacileDataSet

hdf5.fn

name of HDF5 data file in FacileDataSet

meta.fn

name of metadata YAML data file in FacileDataSet

anno.dir

A directory to house custom annotations/sample covariates

cache_size

A custom paramter for the SQLite database

db.loc

single character, location for the data

...

other args to pass down, not used at the moment

covdef.fn

A custom path to the yaml file that has covariate mapping info

Value

a FacileDataSet object

Details

A FacileDataSet is materialized on disk by a well-structured directory, which minimally includes the following items:

  1. A data.sqlite SQLite database that stores feature and sample metadata

  2. A data.h5 HDF5 file that stores a multitude of dense assay matrices that are generated from the assays performed on the samples in the FacileDataSet.

  3. A meta.yaml file tha contains informaiton about the FacileDataSet. To better understand the structure and contents of this file, you can refer to the following: a. The included testdata/expected-meta.yaml file, which is an exemplar file for exampleFacileDataSet(). b. The help file provided by the eav_metadata_create() function, which describes in greater detail how we track a dataset's sample-level covariates (aka, "pData" in the bioconductor world). In the meantime, a short description of the entries found in the meta.yaml file is provded here:

    • name: the name of the dataset (ie. "FacileTCGADataSet")

    • organism: "Homo sapiens", "Mus musculus", ec.

    • default_assay: the name of the assay to use by default if none is specified in calls to fetch_assay_data(), with_assay_data(), etc. (kind of like how "exprs" is the default assay used when working with a Biobase::ExpressionSet)

    • datasets: a section tha enumerates the datases included internally. The datasets are further enumerated.

    • sample_covariates: a section that enumerates the covariatets that are tracked over the samples inside the FacileDataSet (ie. a mapping of the pData for the samples). Reference eav_metadata_create() for more information.

  4. A custom-annotation directory, which stores custom sample_covariate (aka "pData") informaiton that analysts can identify and describe during the course of an analysis, or even add from external sources. Although this directory is required in the directory structure of a valid FacileDataSet, the FacileDataSet() constructor can be called with a custom anno.dir parameter so that custom annotations are stored elsewhere.

See also

Other FacileDataSet: dbfn(), hdf5fn(), meta_file()

Examples

fn <- system.file("extdata", "exampleFacileDataSet", package = "FacileData") fds <- FacileDataSet(fn)