vignettes/FacileAnalysis-fpca.Rmd
FacileAnalysis-fpca.Rmd
Under Construction
Easy to run a PCA over the rnaseq data, or whatever other assy you got there (CNV, proteomics, take your pick)
library(FacileData)
library(FacileAnalysis)
library(dplyr)
pca.crc <- exampleFacileDataSet() %>%
filter_samples(indication == "CRC", sample_type == "tumor") %>%
fpca(assay_name = "rnaseq")
Vizualize the results:
viz(pca.crc, color_aes = "subtype_crc_cms", shape_aes = "sex")
Need to fix up report()
Not sure how to compare()
two PCA results …
The principal components are a new basis of orthogonal axes that are linear combinations of the original axes, which in this examples are genes. By exploring which genes (and gene sets) are most highly loaded along each principal component, analysts can begin to better understand what biological processes might be driving the variability in their data.
The ranks(fpca())
result extracts the feature (gene) loadings onto the principal components. signature(fpca())
extracts the top N ntop
features from each principal component.
The code below returns the top 10 features over the first three PCs. The score
column corresponds to the loading:
pca.sig <- signature(pca.crc, dims = 1:3, ntop = 10)
pca.sig %>%
tidy() %>%
select(feature_id, symbol, dimension, score)
#> # A tibble: 60 × 4
#> feature_id symbol dimension score
#> <chr> <chr> <chr> <dbl>
#> 1 340146 SLC35D3 PC1 0.0760
#> 2 58529 MYOZ1 PC1 0.0729
#> 3 7123 CLEC3B PC1 0.0714
#> 4 259282 BOD1L PC1 0.0684
#> 5 29988 SLC2A8 PC1 0.0683
#> 6 124912 SPACA3 PC1 0.0683
#> 7 4931 NVL PC1 0.0680
#> 8 10564 ARFGEF2 PC1 0.0665
#> 9 3158 HMGCS2 PC1 0.0659
#> 10 2745 GLRX PC1 0.0650
#> # … with 50 more rows
When we map the expression values of the genes to the PCA viz
, we can see that the expression of the genes with high abs(score)
values should track with its position on its principal component.
# We need to hack the expression values for these genes into the pca result for
# now, but we are working to upgrade the data retrieval abilities around
# FacileAnalysisResult objects so that this can be made effortless.
pcgenes <- c(IL8 = "3576", ARFGEF2 = "10564", BRIP1 = "83990")
pca.crc$result <- pca.crc$result %>%
with_assay_data(pcgenes)
# The following issue tracks our intent to enable easier query/retrieval of
# data from different places to make interactive exploration more facile:
#
# https://github.com/facileverse/FacileData/issues/8
#
# Instead of hacking the gene expression data back into the pca.crc$result
# tibble, we should be able to do something like this:
#
# viz(pca.crc, color_aes = "feature:IL8|name")
For example, we can see that IL8 is highly loaded on PC1:
viz(pca.crc, color_aes = "IL8", title = "IL8")
ARFGEF2 is also highly loaded on PC1, but witha flipped sign:
viz(pca.crc, color_aes = "ARFGEF2", title = "ARFGEF2")
and BRIP1 is a highly loaded gene on PC2
viz(pca.crc, color_aes = "BRIP1", title = "BRIP1")