pData
data.frameR/entity-attribute-value.R
eav-metadata.Rd
Sample covariates (aka pData
) are encoded in an
entity-attribute-value (EAV) table.
Metadata about these covariates are stored in a meta.yaml
file in the
FacileDataSet
directory which enables the FacileDataSet
to cast the value
stored in the EAV table to its native R type. This function generates the
list-of-list structure to represent the sample_covariates
section of the
meta.yaml
file.
eav_metadata_create( x, ignore = c("dataset", "sample_id"), covariate_def = list() )
x | a |
---|---|
ignore | the columns in |
covariate_def | a named list of covariate definitions. The names of
this list are the names the covariates will be called in the target
|
a list-of-lists that encodes the sample_covariate
section of the
meta.yaml
file for a FacileDataSet
. Each list element will have the
following elements:
arguments: the name(s) of the columns from x
used in this covariate
description.
class: "real"
, "categorical"
, (survival needs a bity of work)
description: a string with minimal description
type: this isn't really used in the dataset, but another application might want to group covariates by type.
For simple pData
covariates, each column is treated independently from the
rest. There are some types of covariates which require multiple columns for
proper encoding, such as encoding of survival information, which requires
a pair of values that indicate the "time to event" and the status of the
event (death or censored). In these cases, the caller needs to provide an
entry in the covariate_def
list that describes which pData
columns
(varname
) goes into the single facile covariate value.
Please refer to the Encoding Survival Covariates section for a more
detailed description of how to define encoding survival information into the
EAV table using the covariate_def
parameter. Further examples of how to
encode other complex attributes will be added as they are required, but you
can reference the Encoding Arbitrarily Complex Covariates section for
some more information.
UPDATE: FacileData can now use survival data encoded as a survival::Surv
object
stored as a pData column. Read on for the original encoding strategy, which
is still implemented.
Survival data in R is typically encoded by two vectors. One vector that indicates the "time to event" (tte), and a second to indicate whether or not the denoted tte is an "event" (1) or "censored" (0).
Normally these vectors appear as two columns in an experiment's pData
,
and therefore need to be encoded into the FacileDataSet
's EAV table. To do
so, the pair of vectors are turned into a signed numeric value. The absolute
value of the numeric indicates the "time to event" and the sign of the value
indicates its censoring status.
Let's assume we have tte_OS
and event_OS
column that are used to encode
a patient's overall survival (time and censor status). To store this as an
"OS" covariate in the EAV table, a covariate_def
list-of-list definition
that captures this encoding would look like this:
covariate_def <- list( OS=list( class="right_censored", arguments=list(time="tte_OS", event="event_OS"), label="Overall Survival", type="clinical", description="Overall survival in days"))
Note how the name of the list-entry in covariate_def
defines the name of
the covariate in the FacileDataSet
. The class
entry for the OS
definition indicates the type of variable this is. The arguments
section
is only used when encoding a wide pData
into the EAV value column.
names(arguments)
correspond to the parameters in the
[eav_encode_right_censored()]
function, and their values are the columns in
the target pData
that populate the respective parameters in the function
call. The analagous meta.yaml
entry in the sample_covariates
section for
the "OS"
covariate_def
entry looks like so:
sample_covariates: OS: class: right_censored arguments: time: tte_OS event: event_OS label: "Overall Survival" type: "clinical" description: "Overall survival in days"
To encode a new type of complex covariate from a wide pData
data.frame,
we need to:
Specify a new class
(like "right_censored"
) for use within a
FacileDataSet
.
Define an eav_encode_<class>(arg1, arg2, ...)
function which takes the
R data vectors (arg1, arg2) and converts them into a single value for the
EAV table.
Define a eav_decode_<class>(x, attrname, def, ...)
function which takes
the single value in the EAV table and casts it back into the R-naive data
vector(s).
x
is the vector of (character) values from the EAV table
attrname
is the name of the covariate in the EAV table
def
is the definition-list for this covariate.
...
allows each decode function to be further customized.
# covariate_def definition to take tte_OS and tte_event columns and turn # into a facile "OS" right_censored survival covariate cc <- list( OS=list( arguments=list(time="tte_OS", event="event_OS"), label="Overall Survival", class="right_censored", type="clinical", description="Overall survival in days"))