We rely on meta-information about our data types than "usual", and its useful to know what types of identifiers we are using for different assay. This function tries to guess whether an identifier is an ensembl gene identifier, entrez id, etc.

infer_feature_type(x, with_organism = FALSE, ...)

Arguments

x

a character vector of ids

Value

data.frame with id (x) and id_type. If with_organism = TRUE, a third organism column is added with a guess for the organism.

Details

A two-column data.frame is returned for id_type and organism. Organism is "unknown" for identifiers where there this can't be inferred (like Refseq).

If an identifier matches more than one id_type, the id_type is set to "ambiguous". If the identifier doesn't match any guesses, then "unknown".

Examples

fids <- c("NC_000023", "ENSG00000101811", "ENSMUSG00000030088.2", "85007") infer_feature_type(fids)
#> # A tibble: 4 × 2 #> id id_type #> <chr> <chr> #> 1 NC_000023 refseq #> 2 ENSG00000101811 ens_gene #> 3 ENSMUSG00000030088.2 ens_gene #> 4 85007 entrez