Principal Component Analysis
Dirty Statistical Models
Yang, Eunho, Ravikumar, Pradeep K.
We provide a unified framework for the high-dimensional analysis of "superposition-structured" or "dirty" statistical models: where the model parameters are a "superposition" of structurally constrained parameters. We allow for any number and types of structures, and any statistical model. We consider the general class of $M$-estimators that minimize the sum of any loss function, and an instance of what we call a "hybrid" regularization, that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. Papers published at the Neural Information Processing Systems Conference.
Online Robust PCA via Stochastic Optimization
Feng, Jiashi, Xu, Huan, Yan, Shuicheng
Robust PCA methods are typically based on batch optimization and have to load all the samples into memory. This prevents them from efficiently processing big data. In this paper, we develop an Online Robust Principal Component Analysis (OR-PCA) that processes one sample per time instance and hence its memory cost is independent of the data size, significantly enhancing the computation and storage efficiency. The proposed method is based on stochastic optimization of an equivalent reformulation of the batch RPCA method. Indeed, we show that OR-PCA provides a sequence of subspace estimations converging to the optimum of its batch counterpart and hence is provably robust to sparse corruption.
Improved Distributed Principal Component Analysis
Liang, Yingyu, Balcan, Maria-Florina F., Kanchanapally, Vandana, Woodruff, David
We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as $k$-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems.
Cone-Constrained Principal Component Analysis
Deshpande, Yash, Montanari, Andrea, Richard, Emile
Estimating a vector from noisy quadratic observations is a task that arises naturally in many contexts, from dimensionality reduction, to synchronization and phase retrieval problems. It is often the case that additional information is available about the unknown vector (for instance, sparsity, sign or magnitude of its entries). Many authors propose non-convex quadratic optimization problems that aim at exploiting optimally this information. However, solving these problems is typically NP-hard. We consider a simple model for noisy quadratic observation of an unknown vector $\bvz$.
Robust PCA with compressed data
Ha, Wooseok, Barber, Rina Foygel
The robust principal component analysis (RPCA) problem seeks to separate low-rank trends from sparse outlierswithin a data matrix, that is, to approximate a $n\times d$ matrix $D$ as the sum of a low-rank matrix $L$ and a sparse matrix $S$.We examine the robust principal component analysis (RPCA) problem under data compression, wherethe data $Y$ is approximately given by $(L S)\cdot C$, that is, a low-rank $ $ sparse data matrix that has been compressed to size $n\times m$ (with $m$ substantially smaller than the original dimension $d$) via multiplication witha compression matrix $C$. We give a convex program for recovering the sparse component $S$ along with the compressed low-rank component $L\cdot C$, along with upper bounds on the error of this reconstructionthat scales naturally with the compression dimension $m$ and coincides with existing results for the uncompressedsetting $m d$. Our results can also handle error introduced through additive noise or through missing data.The scaling of dimension, compression, and signal complexity in our theoretical results is verified empirically through simulations, and we also apply our method to a data set measuring chlorine concentration acrossa network of sensors, to test its performance in practice. Papers published at the Neural Information Processing Systems Conference.
Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated
Given a matrix of observed data, Principal Components Analysis (PCA) computes a small number of orthogonal directions that contain most of its variability. Provably accurate solutions for PCA have been in use for a long time. However, to the best of our knowledge, all existing theoretical guarantees for it assume that the data and the corrupting noise are mutually independent, or at least uncorrelated. This is valid in practice often, but not always. In this paper, we study the PCA problem in the setting where the data and noise can be correlated.
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
Li, Chris Junchi, Wang, Mengdi, Liu, Han, Zhang, Tong
In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient method for the principal component analysis. Oja's iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Oja's iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere. We characterize the Oja's iteration in three phases using diffusion approximation and weak convergence tools. Our three-phase analysis further provides a finite-sample error bound for the running estimate, which matches the minimax information lower bound for PCA under the additional assumption of bounded samples.
Optimal Sparse Linear Encoders and Sparse PCA
Magdon-Ismail, Malik, Boutsidis, Christos
Principal components analysis (PCA) is the optimal linear encoder of data. Sparse linear encoders (e.g., sparse PCA) produce more interpretable features that can promote better generalization. We answer both questions by providing the first polynomial-time algorithms to construct \emph{optimal} sparse linear auto-encoders; additionally, we demonstrate the performance of our algorithms on real data. Papers published at the Neural Information Processing Systems Conference.
Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis
Raja, Haroon, Bajwa, Waheed U.
This paper considers the problem of estimating the principal eigenvector of a covariance matrix from independent and identically distributed data samples in streaming settings. The streaming rate of data in many contemporary applications can be high enough that a single processor cannot finish an iteration of existing methods for eigenvector estimation before a new sample arrives. This paper formulates and analyzes a distributed variant of the classical Krasulina's method (D-Krasulina) that can keep up with the high streaming rate of data by distributing the computational load across multiple processing nodes. The analysis shows that---under appropriate conditions---D-Krasulina converges to the principal eigenvector in an order-wise optimal manner; i.e., after receiving $M$ samples across all nodes, its estimation error can be $O(1/M)$. In order to reduce the network communication overhead, the paper also develops and analyzes a mini-batch extension of D-Krasulina, which is termed DM-Krasulina. The analysis of DM-Krasulina shows that it can also achieve order-optimal estimation error rates under appropriate conditions, even when some samples have to be discarded within the network due to communication latency. Finally, experiments are performed over synthetic and real-world data to validate the convergence behaviors of D-Krasulina and DM-Krasulina in high-rate streaming settings.
A Fast deflation Method for Sparse Principal Component Analysis via Subspace Projections
Xu, Cong, Yang, Min, Zhang, Jin
Given a data set, PCA aims at finding a sequence of orthogonal vectors that repr esent the directions of largest variance. By capturing these directions, the princ ipal components offer a way to compress the data with minimum information loss. However, principal components are usually linear combinations of all original features. That is, the weights in the linear combinations (known as loadings) are typically nonzero. I n this sense, it is difficult to give a good physical interpretation. During the past decade, various sparse principal component analysis (SPCA) approaches have been developed to improve the interpretabili ty of principal components. SPCA is an extension of PCA that aims at finding sparse loading vectors capturing the maximum amount of variance in the data. These SPCA methods ca n be categorized into two groups: block methods [16,20,22-24,32] and deflati on methods [5,7,25,28]. Block methods aims to find all sparse loadings together, whil e deflation methods compute one loading at a time.