Zhang, Youwei, Ghaoui, Laurent E.

Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination pre-processing result, coupled with the favorable fact that features in real-life data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing first-order ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a user-interpretable way, providing an attractive alternative approach to topic models.

Zhang, Youwei, Ghaoui, Laurent El

Technically speaking, PCA uses orthogonal projection of highly correlated variables to a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This linear transformation is defined in such a way that the first principal component has the largest possible variance. It accounts for as much of the variability in the data as possible by considering highly correlated features. Each succeeding component in turn has the highest variance using the features that are less correlated with the first principal component and that are orthogonal to the preceding component.

Principal component regression (PCR) is a two-stage procedure: the first stage performs principal component analysis (PCA) and the second stage constructs a regression model whose explanatory variables are replaced by principal components obtained by the first stage. Since PCA is performed by using only explanatory variables, the principal components have no information about the response variable. To address the problem, we propose a one-stage procedure for PCR in terms of singular value decomposition approach. Our approach is based upon two loss functions, a regression loss and a PCA loss, with sparse regularization. The proposed method enables us to obtain principal component loadings that possess information about both explanatory variables and a response variable. An estimation algorithm is developed by using alternating direction method of multipliers. We conduct numerical studies to show the effectiveness of the proposed method.

Using statistical physicstechniques including the Gibbs distribution, binary decision fields and effective energies, we propose self-organizing PCA rules which are capable of resisting outliers while fulfilling various PCA-related tasks such as obtaining the first principal component vector,the first k principal component vectors, and directly finding the subspace spanned by the first k vector principal component vectorswithout solving for each vector individually. Comparative experimentshave shown that the proposed robust rules improve the performances of the existing PCA algorithms significantly whenoutliers are present.