D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multiple High-dimensional Datasets

Shu, Hai, Qu, Zhe, Zhu, Hongtu

arXiv.org Machine Learning 

Such studies include The Cancer Genome Atlas (TCGA; Hoadley et al., 2018) with multi-platform genomic data for tumor samples, and Human Connectome Project (HCP; Van Essen et al., 2013) with multi-modal brain images of healthy adults, among many others (Crawford et al., 2016; Jensen et al., 2017). The use of multiple data types can allow us to enhance understanding the etiology of many complex diseases, such as cancers (Ciriello et al., 2015; Campbell et al., 2018) and neurodegenerative diseases (Weiner et al., 2013; Saeed et al., 2017). Researchers hence have became highly interested in studying the shared information and individual features across multi-type datasets through separating their common and distinctive variation structures (van der Kloet et al., 2016; Smilde et al., 2017; Li et al., 2018). Let Y k R p k n be the k -th row-mean centered dataset obtained on a common set of n objects for k 1,...,K, where p k is the number of variables for the k -th dataset. One popular approach for disentangling their common and distinctive variation structures is to decompose each data matrix into Y k X k E k C k D k E k for k 1,...,K, (1) where { X k} K k 1 are low-rank signal matrices with { E k} K k 1 being additive noise matrices, { C k} K k 1 are low-rank common-variation matrices that represent the signal data coming from the common mechanism shared across all datasets, and { D k} K k 1are low-rank distinctive-variation matrices each from the distinctive mechanism of each single dataset that is not shared by all.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found