Sparse GCA and Thresholded Gradient Descent

Gao, Sheng, Ma, Zongming

arXiv.org Machine Learning 

With the advent of big data acquisition technology, it has become increasingly important to integrate information across multiple datasets collected on a common set of subjects. Canonical correlation analysis (CCA), first proposed by Hotelling [20], is a widely used statistical tool to integrate information from two datasets: It seeks linear combinations of variables within each dataset such that their correlation is maximized. However, recent advances in fields such as multi-omics and multimodal brain imaging have presented us with new challenges, since scientists are often able to collect more than two datasets on the same set of subjects nowadays. To tackle these challenges, we turn to a useful generalization of CCA called generalized correlation analysis (GCA) [23] which aims to explore linear relationships across multiple data sources. Kettenring [23] proposed five different techniques for generalized correlation analysis of multiple datasets, where different methods correspond to maximization of different objective functions of covariances and correlations, subject to certain normalization constraints.