Principal Component Analysis
Polar $n$-Complex and $n$-Bicomplex Singular Value Decomposition and Principal Component Pursuit
Chan, Tak-Shing T., Yang, Yi-Hsuan
Informed by recent work on tensor singular value decomposition and circulant algebra matrices, this paper presents a new theoretical bridge that unifies the hypercomplex and tensor-based approaches to singular value decomposition and robust principal component analysis. We begin our work by extending the principal component pursuit to Olariu's polar $n$-complex numbers as well as their bicomplex counterparts. In so doing, we have derived the polar $n$-complex and $n$-bicomplex proximity operators for both the $\ell_1$- and trace-norm regularizers, which can be used by proximal optimization methods such as the alternating direction method of multipliers. Experimental results on two sets of audio data show that our algebraically-informed formulation outperforms tensor robust principal component analysis. We conclude with the message that an informed definition of the trace norm can bridge the gap between the hypercomplex and tensor-based approaches. Our approach can be seen as a general methodology for generating other principal component pursuit algorithms with proper algebraic structures.
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
Li, Chris Junchi, Wang, Mengdi, Liu, Han, Zhang, Tong
In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis. Oja's iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Oja's iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere. We characterize the Oja's iteration in three phases using diffusion approximation and weak convergence tools. Our three-phase analysis further provides a finite-sample error bound for the running estimate, which matches the minimax information lower bound for principal component analysis under the additional assumption of bounded samples.
Understanding Dimension Reduction with Principal Component Analysis (PCA)
Big Data Analytics is a buzzword nowadays. Everyone is talking about it. Big data Analytics has found application in many sectors like medicine, politics, dating. Though big data analytics is used in bettering many aspects of human life, it comes with its own problems. One of them is'Curse of dimensionality'.
Neural Component Analysis for Fault Detection
Principal component analysis (PCA) is largely adopted for chemical process monitoring and numerous PCA-based systems have been developed to solve various fault detection and diagnosis problems. Since PCA-based methods assume that the monitored process is linear, nonlinear PCA models, such as autoencoder models and kernel principal component analysis (KPCA), has been proposed and applied to nonlinear process monitoring. However, KPCA-based methods need to perform eigen-decomposition (ED) on the kernel Gram matrix whose dimensions depend on the number of training data. Moreover, prefixed kernel parameters cannot be most effective for different faults which may need different parameters to maximize their respective detection performances. Autoencoder models lack the consideration of orthogonal constraints which is crucial for PCA-based algorithms. To address these problems, this paper proposes a novel nonlinear method, called neural component analysis (NCA), which intends to train a feedforward neural work with orthogonal constraints such as those used in PCA. NCA can adaptively learn its parameters through backpropagation and the dimensionality of the nonlinear features has no relationship with the number of training samples. Extensive experimental results on the Tennessee Eastman (TE) benchmark process show the superiority of NCA in terms of missed detection rate (MDR) and false alarm rate (FAR). The source code of NCA can be found in https://github.com/haitaozhao/Neural-Component-Analysis.git.
Contrastive Principal Component Analysis
Abid, Abubakar, Zhang, Martin J., Bagaria, Vivek K., Zou, James
We present a new technique called contrastive principal component analysis (cPCA) that is designed to discover low-dimensional structure that is unique to a dataset, or enriched in one dataset relative to other data. The technique is a generalization of standard PCA, for the setting where multiple datasets are available -- e.g. a treatment and a control group, or a mixed versus a homogeneous population -- and the goal is to explore patterns that are specific to one of the datasets. We conduct a wide variety of experiments in which cPCA identifies important dataset-specific patterns that are missed by PCA, demonstrating that it is useful for many applications: subgroup discovery, visualizing trends, feature selection, denoising, and data-dependent standardization. We provide geometrical interpretations of cPCA and show that it satisfies desirable theoretical guarantees. We also extend cPCA to nonlinear settings in the form of kernel cPCA. We have released our code as a python package and documentation is on Github.
Lazy stochastic principal component analysis
Wojnowicz, Michael, Nguyen, Dinh, Li, Li, Zhao, Xuan
Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.
Bayesian nonparametric Principal Component Analysis
Elvira, Clément, Chainais, Pierre, Dobigeon, Nicolas
Principal component analysis (PCA) is very popular to perform dimension reduction. The selection of the number of significant components is essential but often based on some practical heuristics depending on the application. Only few works have proposed a probabilistic approach able to infer the number of significant components. To this purpose, this paper introduces a Bayesian nonparametric principal component analysis (BNP-PCA). The proposed model projects observations onto a random orthogonal basis which is assigned a prior distribution defined on the Stiefel manifold. The prior on factor scores involves an Indian buffet process to model the uncertainty related to the number of components. The parameters of interest as well as the nuisance parameters are finally inferred within a fully Bayesian framework via Monte Carlo sampling. A study of the (in-)consistence of the marginal maximum a posteriori estimator of the latent dimension is carried out. A new estimator of the subspace dimension is proposed. Moreover, for sake of statistical significance, a Kolmogorov-Smirnov test based on the posterior distribution of the principal components is used to refine this estimate. The behaviour of the algorithm is first studied on various synthetic examples. Finally, the proposed BNP dimension reduction approach is shown to be easily yet efficiently coupled with clustering or latent factor models within a unique framework.
Informed Non-convex Robust Principal Component Analysis with Features
Xue, Niannan, Deng, Jiankang, Panagakis, Yannis, Zafeiriou, Stefanos
We revisit the problem of robust principal component analysis with features acting as prior side information. To this aim, a novel, elegant, non-convex optimization approach is proposed to decompose a given observation matrix into a low-rank core and the corresponding sparse residual. Rigorous theoretical analysis of the proposed algorithm results in exact recovery guarantees with low computational complexity. Aptly designed synthetic experiments demonstrate that our method is the first to wholly harness the power of non-convexity over convexity in terms of both recoverability and speed. That is, the proposed non-convex approach is more accurate and faster compared to the best available algorithms for the problem under study. Two real-world applications, namely image classification and face denoising further exemplify the practical superiority of the proposed method.
Manifold Learning Using Kernel Density Estimation and Local Principal Components Analysis
Mohammed, Kitty, Narayanan, Hariharan
We consider the problem of recovering a $d-$dimensional manifold $\mathcal{M} \subset \mathbb{R}^n$ when provided with noiseless samples from $\mathcal{M}$. There are many algorithms (e.g., Isomap) that are used in practice to fit manifolds and thus reduce the dimensionality of a given data set. Ideally, the estimate $\mathcal{M}_\mathrm{put}$ of $\mathcal{M}$ should be an actual manifold of a certain smoothness; furthermore, $\mathcal{M}_\mathrm{put}$ should be arbitrarily close to $\mathcal{M}$ in Hausdorff distance given a large enough sample. Generally speaking, existing manifold learning algorithms do not meet these criteria. Fefferman, Mitter, and Narayanan (2016) have developed an algorithm whose output is provably a manifold. The key idea is to define an approximate squared-distance function (asdf) to $\mathcal{M}$. Then, $\mathcal{M}_\mathrm{put}$ is given by the set of points where the gradient of the asdf is orthogonal to the subspace spanned by the largest $n - d$ eigenvectors of the Hessian of the asdf. As long as the asdf meets certain regularity conditions, $\mathcal{M}_\mathrm{put}$ is a manifold that is arbitrarily close in Hausdorff distance to $\mathcal{M}$. In this paper, we define two asdfs that can be calculated from the data and show that they meet the required regularity conditions. The first asdf is based on kernel density estimation, and the second is based on estimation of tangent spaces using local principal components analysis.