Goto

Collaborating Authors

 Principal Component Analysis


Supervised PCA: A Multiobjective Approach

arXiv.org Machine Learning

Methods for supervised principal component analysis (SPCA) aim to incorporate label information into principal component analysis (PCA), so that the extracted features are more useful for a prediction task of interest. Prior work on SPCA has focused primarily on optimizing prediction error, and has neglected the value of maximizing variance explained by the extracted features. We propose a new method for SPCA that addresses both of these objectives jointly, and demonstrate empirically that our approach dominates existing approaches, i.e., outperforms them with respect to both prediction error and variation explained. Our approach accommodates arbitrary supervised learning losses and, through a statistical reformulation, provides a novel low-rank extension of generalized linear models.


Principal Component Analysis (PCA) with Scikit-learn

#artificialintelligence

This is the second unsupervised machine learning algorithm that I'm discussing here. This time, the topic is Principal Component Analysis (PCA). At the very beginning of the tutorial, I'll explain the dimensionality of a dataset, what dimensionality reduction means, main approaches to dimensionality reduction, reasons for dimensionality reduction and what PCA means. Then, I will go deeper into the topic PCA by implementing the PCA algorithm with Scikit-learn machine learning library. This will help you to easily apply PCA to a real-world dataset and get results very fast. In a separate article (not in this one), I will discuss the mathematics behind the principal component analysis by manually executing the algorithm using the powerful numpy and pandas libraries.


Principal Component Analysis (PCA)

#artificialintelligence

During the Data mining process, we are given raw data. Before visualizing or interpreting data, we have to make sure that certain refinement methods are applied to the data before it is available for analysis. This refinement process includes Preprocessing or cleaning the data, such as removing the null or blank values from the data. Next is the Feature selection or Feature Extraction Technique, which is utilized in PCA where the least contributing features are neglected or removed as per requirement. The last stage is the Data Transformation, where the user will apply normalization techniques to scale all the features in the same range.


On Robust Probabilistic Principal Component Analysis using Multivariate $t$-Distributions

arXiv.org Machine Learning

Principal Component Analysis (PCA) is a common multivariate statistical analysis method, and Probabilistic Principal Component Analysis (PPCA) is its probabilistic reformulation under the framework of Gaussian latent variable model. To improve the robustness of PPCA, it has been proposed to change the underlying Gaussian distributions to multivariate $t$-distributions. Based on the representation of $t$-distribution as a scale mixture of Gaussians, a hierarchical model is used for implementation. However, although the robust PPCA methods work reasonably well for some simulation studies and real data, the hierarchical model implemented does not yield the equivalent interpretation. In this paper, we present a set of equivalent relationships between those models, and discuss the performance of robust PPCA methods using different multivariate $t$-distributed structures through several simulation studies. In doing so, we clarify a current misrepresentation in the literature, and make connections between a set of hierarchical models for robust PPCA.


Rapid Robust Principal Component Analysis: CUR Accelerated Inexact Low Rank Estimation

arXiv.org Artificial Intelligence

Robust principal component analysis (RPCA) is a widely used tool for dimension reduction. In this work, we propose a novel non-convex algorithm, coined Iterated Robust CUR (IRCUR), for solving RPCA problems, which dramatically improves the computational efficiency in comparison with the existing algorithms. IRCUR achieves this acceleration by employing CUR decomposition when updating the low rank component, which allows us to obtain an accurate low rank approximation via only three small submatrices. Consequently, IRCUR is able to process only the small submatrices and avoid expensive computing on the full matrix through the entire algorithm. Numerical experiments establish the computational advantage of IRCUR over the state-of-art algorithms on both synthetic and real-world datasets.


Understanding Principal Component Analysis

#artificialintelligence

Machine learning (ML) is a subset of artificial intelligence (AI) and it provides systems with the ability to automatically learn and improve from experience without being explicitly programmed. The algorithms employed within ML are used to find patterns in data that generate insight and help make data-driven decisions and predictions. These types of algorithms are utilized every day to make critical decisions in medical diagnosis, stock trading, transportation, legal matters and much more. Therefore, it can be seen why data scientists place ML on such a high pedestal; it provides a medium for high priority decisions, that can guide better business and smarter actions, in real-time without much human intervention. To learn, ML models use computational methods to understand information directly from data without relying on a predetermined equation.


Machine Learning approach to muon spectroscopy analysis

arXiv.org Machine Learning

In recent years, artificial intelligence techniques have proved to be very successful when applied to problems in physical sciences. Here we apply an unsupervised machine learning (ML) algorithm called principal component analysis (PCA) as a tool to analyse the data from muon spectroscopy experiments. Specifically, we apply the ML technique to detect phase transitions in various materials. The measured quantity in muon spectroscopy is an asymmetry function, which may hold information about the distribution of the intrinsic magnetic field in combination with the dynamics of the sample. Sharp changes of shape of asymmetry functions - measured at different temperatures - might indicate a phase transition. Existing methods of processing the muon spectroscopy data are based on regression analysis, but choosing the right fitting function requires knowledge about the underlying physics of the probed material. Conversely, principal component analysis focuses on small differences in the asymmetry curves and works without any prior assumptions about the studied samples. We discovered that the PCA method works well in detecting phase transitions in muon spectroscopy experiments and can serve as an alternative to current analysis, especially if the physics of the studied material are not entirely known. Additionally, we found out that our ML technique seems to work best with large numbers of measurements, regardless of whether the algorithm takes data only for a single material or whether the analysis is performed simultaneously for many materials with different physical properties.


A Framework for Private Matrix Analysis

arXiv.org Machine Learning

We study private matrix analysis in the sliding window model where only the last $W$ updates to matrices are considered useful for analysis. We give first efficient $o(W)$ space differentially private algorithms for spectral approximation, principal component analysis, and linear regression. We also initiate and show efficient differentially private algorithms for two important variants of principal component analysis: sparse principal component analysis and non-negative principal component analysis. Prior to our work, no such result was known for sparse and non-negative differentially private principal component analysis even in the static data setting. These algorithms are obtained by identifying sufficient conditions on positive semidefinite matrices formed from streamed matrices. We also show a lower bound on space required to compute low-rank approximation even if the algorithm gives multiplicative approximation and incurs additive error. This follows via reduction to a certain communication complexity problem.


Fast algorithms for robust principal component analysis with an upper bound on the rank

arXiv.org Machine Learning

The robust principal component analysis (RPCA) decomposes a data matrix into a low-rank part and a sparse part. There are mainly two types of algorithms for RPCA. The first type of algorithm applies regularization terms on the singular values of a matrix to obtain a low-rank matrix. However, calculating singular values can be very expensive for large matrices. The second type of algorithm replaces the low-rank matrix as the multiplication of two small matrices. They are faster than the first type because no singular value decomposition (SVD) is required. However, the rank of the low-rank matrix is required, and an accurate rank estimation is needed to obtain a reasonable solution. In this paper, we propose algorithms that combine both types. Our proposed algorithms require an upper bound of the rank and SVD on small matrices. First, they are faster than the first type because the cost of SVD on small matrices is negligible. Second, they are more robust than the second type because an upper bound of the rank instead of the exact rank is required. Furthermore, we apply the Gauss-Newton method to increase the speed of our algorithms. Numerical experiments show the better performance of our proposed algorithms.


Risks and Caution on applying PCA for Supervised Learning Problems

#artificialintelligence

The curse of dimensionality is a very crucial problem while dealing with real-life datasets which are generally higher-dimensional data. As the dimensionality of the feature space increases, the number of configurations can grow exponentially, and thus the number of configurations covered by an observation decreases. In such a scenario, Principal Component Analysis plays a major part in efficiently reducing the dimensionality of the data yet retaining as much as possible of the variation present in the data set. Let us give a very brief introduction to Principal Component Analysis before delving into the actual problem. The central idea of Principal Component Analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of correlated variables, while retaining the maximum possible variation present in the data set.