Maximally Correlated Principal Component Analysis

Feb-21-2017–arXiv.org Machine Learning

Soheil Feizi and David Tse Stanford University Abstract In the era of big data, reducing data dimensionality is critical in many areas of science. Widely used Principal Component Analysis (PCA) addresses this problem by computing a low dimensional data embedding that maximally explain variance of the data. However, PCA has two major weaknesses. Firstly, it only considers linear correlations among variables (features), and secondly it is not suitable for categorical data. We resolve these issues by proposing Maximally Correlated Principal Component Analysis (MCPCA). MCPCA computes transformations of variables whose covariance matrix has the largest Ky Fan norm. Variable transformations are unknown, can be nonlinear and are computed in an optimization. MCPCA can also be viewed as a multivariate extension of Maximal Correlation. For jointly Gaussian variables we show that the covariance matrix corresponding to the identity (or the negative of the identity) transformations majorizes covariance matrices of non-identity functions. Using this result we characterize global MCPCA optimizers for nonlinear functions of jointly Gaussian variables for every rank constraint. For categorical variables we characterize global MCPCA optimizers for the rank one constraint based on the leading eigenvector of a matrix computed using pairwise joint distributions. For a general rank constraint we propose a block coordinate descend algorithm and show its convergence to stationary points of the MCPCA optimization. We compare MCPCA with PCA and other state-of-the-art dimensionality reduction methods including Isomap, LLE, multilayer autoencoders (neural networks), kernel PCA, probabilistic PCA and diffusion maps on several synthetic and real datasets. We show that MCPCA consistently provides improved performance compared to other methods. 1 Introduction Let X 1 and X 2 be two mean zero and unit variance random variables. Pearson's correlation [1] defined as ρ Pearson(X 1,X 2) E [X 1X 2 ] (1.1) is a basic statistical parameter and plays a central role in many statistical and machine learning methods such as linear regression [2], principal component analysis [3], and support vector machines [4], partially owing to its simplicity and computational efficiency. Pearson's correlation however has two main weaknesses: firstly it only captures linear dependency between variables, and secondly for discrete (categorical) variables the value of Pearson's correlation depends somewhat arbitrarily on the labels. To overcome these weaknesses, Maximal Correlation (MC) has been proposed and 1 arXiv:1702.05471v2 MC tackles the two main drawbacks of the Pearson's correlation: it models a family of nonlinear relationships between the two variables.

artificial intelligence, machine learning, optimization, (13 more...)

arXiv.org Machine Learning

Feb-21-2017

arXiv.org PDF

Add feedback

Country:
- North America (0.28)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found