Principal Component Analysis
FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis
Gang, Arpita, Bajwa, Waheed U.
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack `exact' or `global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results.
A Coupled CP Decomposition for Principal Components Analysis of Symmetric Networks
Weylandt, Michael, Michailidis, George
In a number of application domains, one observes a sequence of network data; for example, repeated measurements between users interactions in social media platforms, financial correlation networks over time, or across subjects, as in multi-subject studies of brain connectivity. One way to analyze such data is by stacking networks into a third-order array or tensor. We propose a principal components analysis (PCA) framework for sequence network data, based on a novel decomposition for semi-symmetric tensors. We derive efficient algorithms for computing our proposed "Coupled CP" decomposition and establish estimation consistency of our approach under an analogue of the spiked covariance model with rates the same as the matrix case up to a logarithmic term. Our framework inherits many of the strengths of classical PCA and is suitable for a wide range of unsupervised learning tasks, including identifying principal networks, isolating meaningful changepoints or outliers across observations, and for characterizing the "variability network" of the most varying edges. Finally, we demonstrate the effectiveness of our proposal on simulated data and on examples from political science and financial economics. The proof techniques used to establish our main consistency results are surprisingly straight-forward and may find use in a variety of other matrix and tensor decomposition problems.
Summerville
Procedural Content Generation (PCG) using machine learning is a fast growing area of research. Action Role Playing Game (ARPG) levels represent an interesting challenge for PCG due to their multi-tiered structure and nonlinearity. Previous work has used Bayes Nets (BN) to learn properties of the topological structure of levels from The Legend of Zelda. In this paper we describe a method for sampling these learned distributions to generate valid, playable level topologies. We carry this deeper and learn a sampleable representation of the individual rooms using Principal Component Analysis. We combine the two techniques and present a multi-scale machine learned technique for procedurally generating ARPG levels from a corpus of levels from The Legend of Zelda.
Distributionally Robust Fair Principal Components via Geodesic Descents
Vu, Hieu, Tran, Toan, Yue, Man-Chung, Nguyen, Viet Anh
Principal component analysis is a simple yet useful dimensionality reduction technique in modern machine learning pipelines. In consequential domains such as college admission, healthcare and credit approval, it is imperative to take into account emerging criteria such as the fairness and the robustness of the learned projection. In this paper, we propose a distributionally robust optimization problem for principal component analysis which internalizes a fairness criterion in the objective function. The learned projection thus balances the trade-off between the total reconstruction error and the reconstruction error gap between subgroups, taken in the min-max sense over all distributions in a moment-based ambiguity set. The resulting optimization problem over the Stiefel manifold can be efficiently solved by a Riemannian subgradient descent algorithm with a sub-linear convergence rate. Our experimental results on real-world datasets show the merits of our proposed method over state-of-the-art baselines. Machine learning models are ubiquitous in our daily lives and supporting the decision-making process in diverse domains.
How Principal Component Analysis [PCA] Works
What is PCA: Principal Component Analysis (PCA) is a dimensionality reduction technique that enables you to identify correlations and patterns in a data set so that it can be transferred into a data set of significantly lower dimensions without loss of any important information. Step 1: Standardization of the data Standardization is all about scaling your data in such a way that all the variables and values lie within a similar range. Step 2: Computing the covariance matrix A covariance matrix expresses the correlation between the different variables in the data set. It is essential to identify heavily dependent variables because they contain biased and redundant information which reduces the overall performance of the model. Eigenvectors and Eigenvalues are the mathematical constructs that must be computed from the covariance matrix in order to determine the Principal component of the data sets.
Robust factored principal component analysis for matrix-valued outlier accommodation and detection
Ma, Xuan, Zhao, Jianhua, Wang, Yue
Principal component analysis (PCA) is a popular dimension reduction technique for vector data. Factored PCA (FPCA) is a probabilistic extension of PCA for matrix data, which can substantially reduce the number of parameters in PCA while yield satisfactory performance. However, FPCA is based on the Gaussian assumption and thereby susceptible to outliers. Although the multivariate $t$ distribution as a robust modeling tool for vector data has a very long history, its application to matrix data is very limited. The main reason is that the dimension of the vectorized matrix data is often very high and the higher the dimension, the lower the breakdown point that measures the robustness. To solve the robustness problem suffered by FPCA and make it applicable to matrix data, in this paper we propose a robust extension of FPCA (RFPCA), which is built upon a $t$-type distribution called matrix-variate $t$ distribution. Like the multivariate $t$ distribution, the matrix-variate $t$ distribution can adaptively down-weight outliers and yield robust estimates. We develop a fast EM-type algorithm for parameter estimation. Experiments on synthetic and real-world datasets reveal that RFPCA is compared favorably with several related methods and RFPCA is a simple but powerful tool for matrix-valued outlier detection.
Learnable Faster Kernel-PCA for Nonlinear Fault Detection: Deep Autoencoder-Based Realization
Ren, Zelin, Yang, Xuebing, Jiang, Yuchen, Zhang, Wensheng
Kernel principal component analysis (KPCA) is a well-recognized nonlinear dimensionality reduction method that has been widely used in nonlinear fault detection tasks. As a kernel trick-based method, KPCA inherits two major problems. First, the form and the parameters of the kernel function are usually selected blindly, depending seriously on trial-and-error. As a result, there may be serious performance degradation in case of inappropriate selections. Second, at the online monitoring stage, KPCA has much computational burden and poor real-time performance, because the kernel method requires to leverage all the offline training data. In this work, to deal with the two drawbacks, a learnable faster realization of the conventional KPCA is proposed. The core idea is to parameterize all feasible kernel functions using the novel nonlinear DAE-FE (deep autoencoder based feature extraction) framework and propose DAE-PCA (deep autoencoder based principal component analysis) approach in detail. The proposed DAE-PCA method is proved to be equivalent to KPCA but has more advantage in terms of automatic searching of the most suitable nonlinear high-dimensional space according to the inputs. Furthermore, the online computational efficiency improves by approximately 100 times compared with the conventional KPCA. With the Tennessee Eastman (TE) process benchmark, the effectiveness and superiority of the proposed method is illustrated.
Face Recognition using Principal Component Analysis
Recent advance in machine learning has made face recognition not a difficult problem. But in the previous, researchers have made various attempts and developed various skills to make computer capable of identifying people. One of the early attempt with moderate success is eigenface, which is based on linear algebra techniques. In this tutorial, we will see how we can build a primitive face recognition system with some simple linear algebra technique such as principal component analysis. Face Recognition using Principal Component Analysis Photo by Rach Teo, some rights reserved.
Principal Component Analysis for Visualization
Principal component analysis (PCA) is an unsupervised machine learning technique. Perhaps the most popular use of principal component analysis is dimensionality reduction. Besides using PCA as a data preparation technique, we can also use it to help visualize data. A picture is worth a thousand words. With the data visualized, it is easier for us to get some insights and decide on the next step in our machine learning models.
Plotting Functions for the 'parameters' Package
Beyond computing p-values, CIs, Bayesian indices and other measures for a wide variety of models, this package implements features like bootstrapping of parameters and models, feature reduction (feature extraction and variable selection), or tools for data reduction like functions to perform cluster, factor or principal component analysis. Another important goal of the parameters package is to facilitate and streamline the process of reporting results of statistical models, which includes the easy and intuitive calculation of standardized estimates or robust standard errors and p-values.