Principal Component Analysis
Data-driven identification and analysis of the glass transition in polymer melts
Banerjee, Atreyee, Hsu, Hsiao-Ping, Kremer, Kurt, Kukharenko, Oleksandra
Understanding the nature of glass transition, as well as precise estimation of the glass transition temperature for polymeric materials, remain open questions in both experimental and theoretical polymer sciences. We propose a data-driven approach, which utilizes the high-resolution details accessible through the molecular dynamics simulation and considers the structural information of individual chains. It clearly identifies the glass transition temperature of polymer melts of weakly semiflexible chains. By combining principal component analysis and clustering, we identify the glass transition temperature in the asymptotic limit even from relatively short-time trajectories, which just reach into the Rouse-like monomer displacement regime. We demonstrate that fluctuations captured by the principal component analysis reflect the change in a chain's behaviour: from conformational rearrangement above to small rearrangements below the glass transition temperature. Our approach is straightforward to apply, and should be applicable to other polymeric glass-forming liquids.
A Dual Formulation for Probabilistic Principal Component Analysis
De Plaen, Henri, Suykens, Johan A. K.
PCA, but rather in another model based on similar In this paper, we characterize Probabilistic Principal principles. Component Analysis in Hilbert spaces and demonstrate how the optimal solution admits a More recently, Restricted Kernel Machines (Suykens, 2017) representation in dual space. This allows us to develop opened a new door for a probabilistic version of PCA both a generative framework for kernel methods. in primal and dual. They essentially use the Fenchel-Young Furthermore, we show how it englobes Kernel inequality on a variational formulation of KPCA (Suykens Principal Component Analysis and illustrate its et al., 2003; Alaรญz et al., 2018) to obtain an energy function, working on a toy and a real dataset.
Deep Unrolling for Nonconvex Robust Principal Component Analysis
Tan, Elizabeth Z. C., Chaux, Caroline, Soubies, Emmanuel, Tan, Vincent Y. F.
We design algorithms for Robust Principal Component Analysis (RPCA) which consists in decomposing a matrix into the sum of a low rank matrix and a sparse matrix. We propose a deep unrolled algorithm based on an accelerated alternating projection algorithm which aims to solve RPCA in its nonconvex form. The proposed procedure combines benefits of deep neural networks and the interpretability of the original algorithm and it automatically learns hyperparameters. We demonstrate the unrolled algorithm's effectiveness on synthetic datasets and also on a face modeling problem, where it leads to both better numerical and visual performances.
Fault Detection via Occupation Kernel Principal Component Analysis
Morrison, Zachary, Russo, Benjamin P., Lian, Yingzhao, Kamalapurkar, Rushikesh
The reliable operation of automatic systems is heavily dependent on the ability to detect faults in the underlying dynamical system. While traditional model-based methods have been widely used for fault detection, data-driven approaches have garnered increasing attention due to their ease of deployment and minimal need for expert knowledge. In this paper, we present a novel principal component analysis (PCA) method that uses occupation kernels. Occupation kernels result in feature maps that are tailored to the measured data, have inherent noise-robustness due to the use of integration, and can utilize irregularly sampled system trajectories of variable lengths for PCA. The occupation kernel PCA method is used to develop a reconstruction error approach to fault detection and its efficacy is validated using numerical simulations.
Regularized Multivariate Functional Principal Component Analysis
Haghbin, Hossein, Zhao, Yue, Maadooliat, Mehdi
Multivariate Functional Principal Component Analysis (MFPCA) is a valuable tool for exploring relationships and identifying shared patterns of variation in multivariate functional data. However, controlling the roughness of the extracted Principal Components (PCs) can be challenging. This paper introduces a novel approach called regularized MFPCA (ReMFPCA) to address this issue and enhance the smoothness and interpretability of the multivariate functional PCs. ReMFPCA incorporates a roughness penalty within a penalized framework, using a parameter vector to regulate the smoothness of each functional variable. The proposed method generates smoothed multivariate functional PCs, providing a concise and interpretable representation of the data. Extensive simulations and real data examples demonstrate the effectiveness of ReMFPCA and its superiority over alternative methods. The proposed approach opens new avenues for analyzing and uncovering relationships in complex multivariate functional datasets.
Two derivations of Principal Component Analysis on datasets of distributions
In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA from both of these perspectives.
On the use of the Gram matrix for multivariate functional principal components analysis
Golovkine, Steven, Gunning, Edward, Simpkin, Andrew J., Bargary, Norma
Dimension reduction is crucial in functional data analysis (FDA). The key tool to reduce the dimension of the data is functional principal component analysis. Existing approaches for functional principal component analysis usually involve the diagonalization of the covariance operator. With the increasing size and complexity of functional datasets, estimating the covariance operator has become more challenging. Therefore, there is a growing need for efficient methodologies to estimate the eigencomponents. Using the duality of the space of observations and the space of functional features, we propose to use the inner-product between the curves to estimate the eigenelements of multivariate and multidimensional functional datasets. The relationship between the eigenelements of the covariance operator and those of the inner-product matrix is established. We explore the application of these methodologies in several FDA settings and provide general guidance on their usability.
PLPCA: Persistent Laplacian Enhanced-PCA for Microarray Data Analysis
Cottrell, Sean, Wang, Rui, Wei, Guowei
Over the years, Principal Component Analysis (PCA) has served as the baseline approach for dimensionality reduction in gene expression data analysis. It primary objective is to identify a subset of disease-causing genes from a vast pool of thousands of genes. However, PCA possesses inherent limitations that hinder its interpretability, introduce classification ambiguity, and fail to capture complex geometric structures in the data. Although these limitations have been partially addressed in the literature by incorporating various regularizers such as graph Laplacian regularization, existing improved PCA methods still face challenges related to multiscale analysis and capturing higher-order interactions in the data. To address these challenges, we propose a novel approach called Persistent Laplacian-enhanced Principal Component Analysis (PLPCA). PLPCA amalgamates the advantages of earlier regularized PCA methods with persistent spectral graph theory, specifically persistent Laplacians derived from algebraic topology. In contrast to graph Laplacians, persistent Laplacians enable multiscale analysis through filtration and incorporate higher-order simplicial complexes to capture higher-order interactions in the data. We evaluate and validate the performance of PLPCA using benchmark microarray datasets that involve normal tissue samples and four different cancer tissues. Our extensive studies demonstrate that PLPCA outperforms all other state-of-the-art models for classification tasks after dimensionality reduction.
Yet Another Algorithm for Supervised Principal Component Analysis: Supervised Linear Centroid-Encoder
Ghosh, Tomojit, Kirby, Michael
We propose a new supervised dimensionality reduction technique called Supervised Linear Centroid-Encoder (SLCE), a linear counterpart of the nonlinear Centroid-Encoder (CE) \citep{ghosh2022supervised}. SLCE works by mapping the samples of a class to its class centroid using a linear transformation. The transformation is a projection that reconstructs a point such that its distance from the corresponding class centroid, i.e., centroid-reconstruction loss, is minimized in the ambient space. We derive a closed-form solution using an eigendecomposition of a symmetric matrix. We did a detailed analysis and presented some crucial mathematical properties of the proposed approach. %We also provide an iterative solution approach based solving the optimization problem using a descent method. We establish a connection between the eigenvalues and the centroid-reconstruction loss. In contrast to Principal Component Analysis (PCA) which reconstructs a sample in the ambient space, the transformation of SLCE uses the instances of a class to rebuild the corresponding class centroid. Therefore the proposed method can be considered a form of supervised PCA. Experimental results show the performance advantage of SLCE over other supervised methods.
Fair principal component analysis (PCA): minorization-maximization algorithms for Fair PCA, Fair Robust PCA and Fair Sparse PCA
In this paper we propose a new iterative algorithm to solve the fair PCA (FPCA) problem. We start with the max-min fair PCA formulation originally proposed in [1] and derive a simple and efficient iterative algorithm which is based on the minorization-maximization (MM) approach. The proposed algorithm relies on the relaxation of a semi-orthogonality constraint which is proved to be tight at every iteration of the algorithm. The vanilla version of the proposed algorithm requires solving a semi-definite program (SDP) at every iteration, which can be further simplified to a quadratic program by formulating the dual of the surrogate maximization problem. We also propose two important reformulations of the fair PCA problem: a) fair robust PCA -- which can handle outliers in the data, and b) fair sparse PCA -- which can enforce sparsity on the estimated fair principal components. The proposed algorithms are computationally efficient and monotonically increase their respective design objectives at every iteration. An added feature of the proposed algorithms is that they do not require the selection of any hyperparameter (except for the fair sparse PCA case where a penalty parameter that controls the sparsity has to be chosen by the user). We numerically compare the performance of the proposed methods with two of the state-of-the-art approaches on synthetic data sets and a real-life data set.