A Randomized Rounding Algorithm for Sparse PCA

Fountoulakis, Kimon, Kundu, Abhisek, Kontopoulou, Eugenia-Maria, Drineas, Petros

arXiv.org Machine Learning 

Large matrices are a common way of representing modern, massive datasets, since an m n real-valued matrix X provides a natural structure for encoding information about m objects, each of which is described by n features. Principal Components Analysis (PCA) and the Singular Value Decomposition (SVD) are fundamental data analysis tools, expressing a data matrix in terms of a sequence of orthogonal vectors of decreasing importance. While these vectors enjoy strong optimality properties and are often interpreted as fundamental latent factors that underlie the observed data, they are linear combinations of up to all the data points and features. As a result, they are notoriously difficult to interpret in terms of the underlying processes generating the data [MD09]. The seminal work of [dGJ07] introduced the concept of Sparse PCA, where sparsity constraints are enforced on the singular vectors in order to improve interpretability. As noted in [dGJ07, MD09, PDK13], an example where sparsity implies interpretability is document analysis, where sparse principal components can be mapped to specific topics by inspecting the (few) keywords in their support.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found