ENSEMBL GTFs provide biotype information for genes/transcripts. These are things like "3prime_overlapping_ncRNA", "antisense", ..., "protein_coding", etc. This function turns the "biotype"-vector x into a factor with levels in (roughly) the order we care to "unique"-ify these levels. Ie. if a gene has a "protein_coding" annotation, we will care to keep that one over one of its annotations which categorize it as a "processed_transcript"

.level_biotypes(x)

Arguments

x

a character vector of biotypes

Value

a factor version of x, with levels(x) in approximately the order we care about.