Generalized Principal Component Analysis

Jul-3-2019–arXiv.org Machine Learning

Principal component analysis (PCA) [1] is widely used to reduce the dimensionality of large datasets. However, it implicitly optimizes an objective function that is equivalent to a Gaussian likelihood. Hence, for data such as nonnegative, discrete counts that do not follow the normal distribution, PCA may be inappropriate. A motivating example of count data comes from single cell gene expression profiling (scRNA-Seq) where each observation represents a cell and genes are features. Such data are often highly sparse ( 90% zeros) and exhibit skewed distributions poorly matched by Gaussian noise. To remedy this, Collins [2] proposed generalizing PCA to the exponential family in a manner analogous to the generalization of linear regression to generalized linear models. Here, we provide a detailed derivation of generalized PCA (GLM-PCA) with a focus on optimization using Fisher scoring. We also expand on Collins' model by incorporating covariates, and propose post hoc transformations to enhance interpretability of latent factors.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

Jul-3-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found