On the Worst-Case Approximability of Sparse PCA
Chan, Siu On, Papailiopoulos, Dimitris, Rubinstein, Aviad
Principal component analysis (PCA) is one of the most popular tools for data analytics. PCA operates on data point vectors supported on features, and outputs orthogonal directions (i.e., principal components) that maximize the explained variance. A limitation of PCA is that -- in many cases of interest -- the extracted principal components (PCs) are dense. However, in applications such as text analysis, or gene expression analytics, having only a few nonzero features per extracted PC, offers significantly higher interpretabilty. For example, in text analysis where PCs are supported on words, if they consist of only a few of them, then these words can be used to detect frequently occurring topics. Sparse PCA addresses the issue of interpretability directly by enforcing a sparsity constraint on the extracted PCs.
Jul-21-2015