Goto

Collaborating Authors

 Principal Component Analysis


A Beginner's Guide to Principal Component Analysis

#artificialintelligence

In principal component analysis, a principal component is a new feature that is constructed from a linear combination of the original features in a dataset. The principal components are ordered such that the first principal component has the highest possible variance (i.e., the greatest amount of spread or dispersion in the data), and each subsequent component in turn has the highest variance possible under the constraint that it is orthogonal (i.e., uncorrelated) to the previous components. The idea behind PCA is to reduce the dimensionality of a dataset by projecting the data onto a lower-dimensional space, while still preserving as much of the variance in the data as possible. This is done by selecting a smaller number of principal components that capture the most important information in the data, and discarding the remaining, less important components. In this way, PCA can be used to identify patterns and relationships in high-dimensional data, and to visualize data in a lower-dimensional space for easier interpretation.


Quasi-parametric rates for Sparse Multivariate Functional Principal Components Analysis

arXiv.org Machine Learning

This work aims to give non-asymptotic results for estimating the first principal component of a multivariate random process. We first define the covariance function and the covariance operator in the multivariate case. We then define a projection operator. This operator can be seen as a reconstruction step from the raw data in the functional data analysis context. Next, we show that the eigenelements can be expressed as the solution to an optimization problem, and we introduce the LASSO variant of this optimization problem and the associated plugin estimator. Finally, we assess the estimator's accuracy. We establish a minimax lower bound on the mean square reconstruction error of the eigenelement, which proves that the procedure has an optimal variance in the minimax sense.


Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data

arXiv.org Artificial Intelligence

Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing the single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.


Principal Component Analysis for Dimensionality Reduction in Python - MachineLearningMastery.com Principal Component Analysis for Dimensionality Reduction in Python - MachineLearningMastery.com

#artificialintelligence

Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. Fewer input variables can result in a simpler predictive model that may have better performance when making predictions on new data. Perhaps the most popular technique for dimensionality reduction in machine learning is Principal Component Analysis, or PCA for short. This is a technique that comes from the field of linear algebra and can be used as a data preparation technique to create a projection of a dataset prior to fitting a model. In this tutorial, you will discover how to use PCA for dimensionality reduction when developing predictive models.


An online algorithm for contrastive Principal Component Analysis

arXiv.org Artificial Intelligence

Finding informative low-dimensional representations that can be computed efficiently in large datasets is an important problem in data analysis. Recently, contrastive Principal Component Analysis (cPCA) was proposed as a more informative generalization of PCA that takes advantage of contrastive learning. However, the performance of cPCA is sensitive to hyper-parameter choice and there is currently no online algorithm for implementing cPCA. Here, we introduce a modified cPCA method, which we denote cPCA*, that is more interpretable and less sensitive to the choice of hyper-parameter. We derive an online algorithm for cPCA* and show that it maps onto a neural network with local learning rules, so it can potentially be implemented in energy efficient neuromorphic hardware. We evaluate the performance of our online algorithm on real datasets and highlight the differences and similarities with the original formulation.


Symmetry-Aware Autoencoders: s-PCA and s-nlPCA

arXiv.org Artificial Intelligence

Nonlinear principal component analysis (NLPCA) via autoencoders has attracted attention in the dynamical systems community due to its larger compression rate when compared to linear principal component analysis (PCA). These model reduction methods experience an increase in the dimensionality of the latent space when applied to datasets that exhibit invariant samples due to the presence of symmetries. In this study, we introduce a novel machine learning embedding for autoencoders, which uses Siamese networks and spatial transformer networks to account for discrete and continuous symmetries, respectively. The Siamese branches autonomously find a fundamental domain to which all samples are transformed, without introducing human bias. The spatial transformer network discovers the optimal slicing template for continuous translations so that invariant samples are aligned in the homogeneous direction. Thus, the proposed symmetry-aware autoencoder is invariant to predetermined input transformations. This embedding can be employed with both linear and nonlinear reduction methods, which we term symmetry-aware PCA (s-PCA) and symmetry-aware NLPCA (s-NLPCA). We apply the proposed framework to the Kolmogorov flow to showcase the capabilities for a system exhibiting both a continuous symmetry as well as discrete symmetries.


Covariance matrix preparation for quantum principal component analysis

arXiv.org Artificial Intelligence

Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| \psi_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overline{\rho} = \sum_i p_i |\psi_i\rangle \langle \psi_i |$. We first show that $\overline{\rho}$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overline{\rho}$, and hence $\overline{\rho}$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.


Intermediate Machine Learning: Principal Component Analysis (PCA) - PythonAlgos

#artificialintelligence

Welcome to the third module in our Machine Learning series. So far we've covered Linear Regression and Logistic Regression. Just to recap, Linear Regression is the simplest implementation of continuous prediction (i.e. Now let's get into something a little more complex – Principal Component Analysis (PCA) in Python. PCA is a dimensionality reduction technique.


Test-Time Adaptation with Principal Component Analysis

arXiv.org Artificial Intelligence

Machine Learning models are prone to fail when test data are different from training data, a situation often encountered in real applications known as distribution shift. While still valid, the training-time knowledge becomes less effective, requiring a test-time adaptation to maintain high performance. Following approaches that assume batch-norm layer and use their statistics for adaptation, we propose a Test-Time Adaptation with Principal Component Analysis (TTAwPCA), which presumes a fitted PCA and adapts at test time a spectral filter based on the singular values of the PCA for robustness to corruptions. TTAwPCA combines three components: the output of a given layer is decomposed using a Principal Component Analysis (PCA), filtered by a penalization of its singular values, and reconstructed with the PCA inverse transform. This generic enhancement adds fewer parameters than current methods. Experiments on CIFAR-10-C and CIFAR- 100-C demonstrate the effectiveness and limits of our method using a unique filter of 2000 parameters.


Generative Principal Component Analysis

arXiv.org Artificial Intelligence

In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an L-Lipschitz continuous generative model with bounded k-dimensional inputs. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis. Principal component analysis (PCA) is one of the most popular techniques for data processing and dimensionality reduction [1], with an abundance of applications such as image recognition [2], gene expression data analysis [3], and clustering [4], [5]. PCA seeks to find the directions that capture maximal variances in vector-valued data. Z. Liu is with the Department of Computer Science, National University of Singapore (email: dcslizha@nus.edu.sg).