Principal Component Analysis
A recursive divide-and-conquer approach for sparse principal component analysis
Zhao, Qian, Meng, Deyu, Xu, Zongben
In this paper, a new method is proposed for sparse PCA based on the recursive divide-and-conquer methodology. The main idea is to separate the original sparse PCA problem into a series of much simpler sub-problems, each having a closed-form solution. By recursively solving these sub-problems in an analytical way, an efficient algorithm is constructed to solve the sparse PCA problem. The algorithm only involves simple computations and is thus easy to implement. The proposed method can also be very easily extended to other sparse PCA problems with certain constraints, such as the nonnegative sparse PCA problem. Furthermore, we have shown that the proposed algorithm converges to a stationary point of the problem, and its computational complexity is approximately linear in both data size and dimensionality. The effectiveness of the proposed method is substantiated by extensive experiments implemented on a series of synthetic and real data in both reconstruction-error-minimization and data-variance-maximization viewpoints.
Large-Scale Sparse Principal Component Analysis with Application to Text Data
Zhang, Youwei, Ghaoui, Laurent El
Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination pre-processing result, coupled with the favorable fact that features in real-life data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing first-order ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a user-interpretable way, providing an attractive alternative approach to topic models.
Sparse Principal Component Analysis with Constraints
Grbovic, Mihajlo (Temple University) | Dance, Christopher Roger (Xerox Research Centre Europe) | Vucetic, Slobodan (Temple University)
The sparse principal component analysis is a variant of the classical principal component analysis, which finds linear combinations of a small number of features that maximize variance across data. In this paper we propose a methodology for adding two general types of feature grouping constraints into the original sparse PCA optimization procedure.We derive convex relaxations of the considered constraints, ensuring the convexity of the resulting optimization problem. Empirical evaluation on three real-world problems, one in process monitoring sensor networks and two in social networks, serves to illustrate the usefulness of the proposed methodology.
Regularized Tensor Factorizations and Higher-Order Principal Components Analysis
High-dimensional tensors or multi-way data are becoming prevalent in areas such as biomedical imaging, chemometrics, networking and bibliometrics. Traditional approaches to finding lower dimensional representations of tensor data include flattening the data and applying matrix factorizations such as principal components analysis (PCA) or employing tensor decompositions such as the CANDECOMP / PARAFAC (CP) and Tucker decompositions. The former can lose important structure in the data, while the latter Higher-Order PCA (HOPCA) methods can be problematic in high-dimensions with many irrelevant features. We introduce frameworks for sparse tensor factorizations or Sparse HOPCA based on heuristic algorithmic approaches and by solving penalized optimization problems related to the CP decomposition. Extensions of these approaches lead to methods for general regularized tensor factorizations, multi-way Functional HOPCA and generalizations of HOPCA for structured data. We illustrate the utility of our methods for dimension reduction, feature selection, and signal recovery on simulated data and multi-dimensional microarrays and functional MRIs.
Large-Scale Sparse Principal Component Analysis with Application to Text Data
Zhang, Youwei, Ghaoui, Laurent E.
Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination pre-processing result, coupled with the favorable fact that features in real-life data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing first-order ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a user-interpretable way, providing an attractive alternative approach to topic models.
Demixed Principal Component Analysis
Brendel, Wieland, Romo, Ranulfo, Machens, Christian K.
In many experiments, the data points collected live in high-dimensional observation spaces, yet can be assigned a set of labels or parameters. In electrophysiological recordings, for instance, the responses of populations of neurons generally depend on mixtures of experimentally controlled parameters. The heterogeneity and diversity of these parameter dependencies can make visualization and interpretation of such data extremely difficult. Standard dimensionality reduction techniques such as principal component analysis (PCA) can provide a succinct and complete description of the data, but the description is constructed independent of the relevant task variables and is often hard to interpret. Here, we start with the assumption that a particularly informative description is one that reveals the dependency of the high-dimensional data on the individual parameters. We show how to modify the loss function of PCA so that the principal components seek to capture both the maximum amount of variance about the data, while also depending on a minimum number of parameters. We call this method demixed principal component analysis (dPCA) as the principal components here segregate the parameter dependencies. We phrase the problem as a probabilistic graphical model, and present a fast Expectation-Maximization (EM) algorithm. We demonstrate the use of this algorithm for electrophysiological data and show that it serves to demix the parameter-dependence of a neural population response.
High-Dimensional Inference with the generalized Hopfield Model: Principal Component Analysis and Corrections
Cocco, Simona, Monasson, Remi, Sessak, Vitor
We consider the problem of inferring the interactions between a set of N binary variables from the knowledge of their frequencies and pairwise correlations. The inference framework is based on the Hopfield model, a special case of the Ising model where the interaction matrix is defined through a set of patterns in the variable space, and is of rank much smaller than N. We show that Maximum Lik elihood inference is deeply related to Principal Component Analysis when the amp litude of the pattern components, xi, is negligible compared to N^1/2. Using techniques from statistical mechanics, we calculate the corrections to the patterns to the first order in xi/N^1/2. We stress that it is important to generalize the Hopfield model and include both attractive and repulsive patterns, to correctly infer networks with sparse and strong interactions. We present a simple geometrical criterion to decide how many attractive and repulsive patterns should be considered as a function of the sampling noise. We moreover discuss how many sampled configurations are required for a good inference, as a function of the system size, N and of the amplitude, xi. The inference approach is illustrated on synthetic and biological data.
Auto-associative models, nonlinear Principal component analysis, manifolds and projection pursuit
Girard, Stรฉphane, Iovleff, Serge
In this paper, auto-associative models are proposed as candidates to the generalization of Principal Component Analysis. We show that these models are dedicated to the approximation of the dataset by a manifold. Here, the word "manifold" refers to the topology properties of the structure. The approximating manifold is built by a projection pursuit algorithm. At each step of the algorithm, the dimension of the manifold is incremented. Some theoretical properties are provided. In particular, we can show that, at each step of the algorithm, the mean residuals norm is not increased. Moreover, it is also established that the algorithm converges in a finite number of steps. Some particular auto-associative models are exhibited and compared to the classical PCA and some neural networks models. Implementation aspects are discussed. We show that, in numerous cases, no optimization procedure is required. Some illustrations on simulated and real data are presented.
PCA 4 DCA: The Application Of Principal Component Analysis To The Dendritic Cell Algorithm
Gu, Feng, Greensmith, Julie, Oates, Robert, Aickelin, Uwe
As one of the newest members in the field of artificial immune systems (AIS), the Dendritic Cell Algorithm (DCA) is based on behavioural models of natural dendritic cells (DCs). Unlike other AIS, the DCA does not rely on training data, instead domain or expert knowledge is required to predetermine the mapping between input signals from a particular instance to the three categories used by the DCA. This data preprocessing phase has received the criticism of having manually over-?tted the data to the algorithm, which is undesirable. Therefore, in this paper we have attempted to ascertain if it is possible to use principal component analysis (PCA) techniques to automatically categorise input data while still generating useful and accurate classication results. The integrated system is tested with a biometrics dataset for the stress recognition of automobile drivers. The experimental results have shown the application of PCA to the DCA for the purpose of automated data preprocessing is successful.
Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization
Wright, John, Ganesh, Arvind, Rao, Shankar, Peng, Yigang, Ma, Yi
Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. Thispaper considers the idealized "robust principal component analysis" problem of recovering a low rank matrix A from corrupted observations D A E. Here, the corrupted entries E are unknown and the errors can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns bysolving a simple convex program, for which we give a fast and provably convergent algorithm. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation spaceand the number of errors E grows in proportion to the total number of entries in the matrix. A byproduct of our analysis is the first proportional growth results for the related problem of completing a low-rank matrix from a small fraction ofits entries. Simulations and real-data examples corroborate the theoretical results, and suggest potential applications in computer vision.