Goto

Collaborating Authors

 Principal Component Analysis


Principal Component Analysis (PCA) for Machine Learning

#artificialintelligence

Sometimes there are situations where the dataset contains many features or as we call it High Dimensionality. High dimensionality datasets can cause a lot of issues, the most common issue that occurs is Overfitting, which means the model is not able to generalize beyond the training data. Therefore, we have to employ a special technique called Dimensionality Reduction Technique to deal with this High Dimensionality. One of the best techniques that we use is known as Principal Component Analysis(PCA). Principal Component Analysis is one of the best Dimensionality Reduction Techniques available in Machine Learning.


A novel approach for Fair Principal Component Analysis based on eigendecomposition

arXiv.org Artificial Intelligence

Principal component analysis (PCA), a ubiquitous dimensionality reduction technique in signal processing, searches for a projection matrix that minimizes the mean squared error between the reduced dataset and the original one. Since classical PCA is not tailored to address concerns related to fairness, its application to actual problems may lead to disparity in the reconstruction errors of different groups (e.g., men and women, whites and blacks, etc.), with potentially harmful consequences such as the introduction of bias towards sensitive groups. Although several fair versions of PCA have been proposed recently, there still remains a fundamental gap in the search for algorithms that are simple enough to be deployed in real systems. To address this, we propose a novel PCA algorithm which tackles fairness issues by means of a simple strategy comprising a one-dimensional search which exploits the closed-form solution of PCA. As attested by numerical experiments, the proposal can significantly improve fairness with a very small loss in the overall reconstruction error and without resorting to complex optimization schemes. Moreover, our findings are consistent in several real situations as well as in scenarios with both unbalanced and balanced datasets.


Meta Sparse Principal Component Analysis

arXiv.org Artificial Intelligence

We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis. We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks. We assume each task to be a different random Principal Component (PC) matrix with a possibly different support and that the support union of the PC matrices is small. We then pool the data from all the tasks to execute an improper estimation of a single PC matrix by maximising the $l_1$-regularised predictive covariance to establish that with high probability the true support union can be recovered provided a sufficient number of tasks $m$ and a sufficient number of samples $ O\left(\frac{\log(p)}{m}\right)$ for each task, for $p$-dimensional vectors. Then, for a novel task, we prove that the maximisation of the $l_1$-regularised predictive covariance with the additional constraint that the support is a subset of the estimated support union could reduce the sufficient sample complexity of successful support recovery to $O(\log |J|)$, where $J$ is the support union recovered from the auxiliary tasks. Typically, $|J|$ would be much less than $p$ for sparse matrices. Finally, we demonstrate the validity of our experiments through numerical simulations.


Distributed Robust Principal Component Analysis

arXiv.org Artificial Intelligence

We study the robust principal component analysis (RPCA) problem in a distributed setting. The goal of RPCA is to find an underlying low-rank estimation for a raw data matrix when the data matrix is subject to the corruption of gross sparse errors. Previous studies have developed RPCA algorithms that provide stable solutions with fast convergence. However, these algorithms are typically hard to scale and cannot be implemented distributedly, due to the use of either SVD or large matrix multiplication. In this paper, we propose the first distributed robust principal analysis algorithm based on consensus factorization, dubbed DCF-PCA. We prove the convergence of DCF-PCA and evaluate DCF-PCA on various problem settings.


High Dimensional Bayesian Optimization with Kernel Principal Component Analysis

arXiv.org Machine Learning

Bayesian Optimization (BO) is a surrogate-based global optimization strategy that relies on a Gaussian Process regression (GPR) model to approximate the objective function and an acquisition function to suggest candidate points. It is well-known that BO does not scale well for high-dimensional problems because the GPR model requires substantially more data points to achieve sufficient accuracy and acquisition optimization becomes computationally expensive in high dimensions. Several recent works aim at addressing these issues, e.g., methods that implement online variable selection or conduct the search on a lower-dimensional sub-manifold of the original search space. Advancing our previous work of PCA-BO that learns a linear sub-manifold, this paper proposes a novel kernel PCA-assisted BO (KPCA-BO) algorithm, which embeds a non-linear sub-manifold in the search space and performs BO on this sub-manifold. Intuitively, constructing the GPR model on a lower-dimensional sub-manifold helps improve the modeling accuracy without requiring much more data from the objective function. Also, our approach defines the acquisition function on the lower-dimensional sub-manifold, making the acquisition optimization more manageable. We compare the performance of KPCA-BO to a vanilla BO and to PCA-BO on the multi-modal problems of the COCO/BBOB benchmark suite. Empirical results show that KPCA-BO outperforms BO in terms of convergence speed on most test problems, and this benefit becomes more significant when the dimensionality increases. For the 60D functions, KPCA-BO achieves better results than PCA-BO for many test cases. Compared to the vanilla BO, it efficiently reduces the CPU time required to train the GPR model and to optimize the acquisition function compared to the vanilla BO.


Riemannian CUR Decompositions for Robust Principal Component Analysis

arXiv.org Artificial Intelligence

Robust Principal Component Analysis (PCA) has received massive attention in recent years. It aims to recover a low-rank matrix and a sparse matrix from their sum. This paper proposes a novel nonconvex Robust PCA algorithm, coined Riemannian CUR (RieCUR), which utilizes the ideas of Riemannian optimization and robust CUR decompositions. This algorithm has the same computational complexity as Iterated Robust CUR, which is currently state-of-the-art, but is more robust to outliers. RieCUR is also able to tolerate a significant amount of outliers, and is comparable to Accelerated Alternating Projections, which has high outlier tolerance but worse computational complexity than the proposed method. Thus, the proposed algorithm achieves state-of-the-art performance on Robust PCA both in terms of computational complexity and outlier tolerance.


Principal Component Analysis (PCA) with Python Examples -- Tutorial

#artificialintelligence

When implementing machine learning algorithms, the inclusion of more features might lead to worsening performance issues. Increasing the number of features will not always improve classification accuracy, which is also known as the curse of dimensionality. Hence, we apply dimensionality reduction to improve classification accuracy by selecting the optimal set of lower dimensionality features. Principal component analysis (PCA) is essential for data science, machine learning, data visualization, statistics, and other quantitative fields. It is essential to know about vector, matrix, and transpose matrix, eigenvalues, eigenvectors, and others to understand the concept of dimensionality reduction.


Dimensionality Reduction: Principal Component Analysis

#artificialintelligence

A dataset is made up of a number of features. As long as these features are related in someway to the target and are optimal in number a machine learning model will be able to produce decent results after learning from the data. But if the number of features are high and most of the features do not contribute towards the model's learning then the performance of the model will go down and the time taken to output predictions also increases. The process of reducing the number of dimensions by transforming the original feature space into a subspace is one method of performing dimensionality reduction and Principal Component Analysis (PCA) does this. So let's take a look into the building concepts of PCA.


Explanation of Principal Component Analysis (PCA)

#artificialintelligence

In this example, the PCA is implemented to project one hundred 2-D data on 1-D space. The first figure(Fig1) shows the elliptical distribution of X with principal component directions.


Introduction to Principal Component Analysis - DataScienceCentral.com

#artificialintelligence

This formula-free summary provides a short overview about how PCA (principal component analysis) works for dimension reduction, that is, to select k features (also called variables) among a larger set of n features, with k much smaller than n. This smaller set of k features built with PCA is the best subset of k features, in the sense that it minimizes the variance of the residual noise when fitting data to a linear model. Note that PCA transforms the initial features into new ones, that are linear combinations of the original features. The proportion of the variance that each eigenvector represents can be calculated by dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues. If the original features are highly correlated, the solution will be very unstable.