Goto

Collaborating Authors

 Principal Component Analysis


Objective-Sensitive Principal Component Analysis for High-Dimensional Inverse Problems

arXiv.org Machine Learning

We present a novel approach for adaptive, differentiable parameterization of large-scale random fields. If the approach is coupled with any gradient-based optimization algorithm, it can be applied to a variety of optimization problems, including history matching. The developed technique is based on principal component analysis (PCA) but modifies a purely data-driven basis of principal components considering objective function behavior. To define an efficient encoding, Gradient-Sensitive PCA uses an objective function gradient with respect to model parameters. We propose computationally efficient implementations of the technique, and two of them are based on stationary perturbation theory (SPT). Optimality, correctness, and low computational costs of the new encoding approach are tested, verified, and discussed. Three algorithms for optimal parameter decomposition are presented and applied to an objective of 2D synthetic history matching. The results demonstrate improvements in encoding quality regarding objective function minimization and distributional patterns of the desired field. Possible applications and extensions are proposed.


Principal Component Analysis Based on T$\ell_1$-norm Maximization

arXiv.org Machine Learning

Classical principal component analysis (PCA) may suffer from the sensitivity to outliers and noise. Therefore PCA based on $\ell_1$-norm and $\ell_p$-norm ($0 < p < 1$) have been studied. Among them, the ones based on $\ell_p$-norm seem to be most interesting from the robustness point of view. However, their numerical performance is not satisfactory. Note that, although T$\ell_1$-norm is similar to $\ell_p$-norm ($0 < p < 1$) in some sense, it has the stronger suppression effect to outliers and better continuity. So PCA based on T$\ell_1$-norm is proposed in this paper. Our numerical experiments have shown that its performance is superior than PCA-$\ell_p$ and $\ell_p$SPCA as well as PCA, PCA-$\ell_1$ obviously.


Your Ultimate Data Mining & Machine Learning Cheat Sheet

#artificialintelligence

Dimensionality reduction is the process of expressing high-dimensional data in a reduced number of dimensions such that each one contains the most amount of information. Dimensionality reduction may be used for visualization of high-dimensional data or to speed up machine learning models by removing low-information or correlated features. Principal Component Analysis, or PCA, is a popular method of reducing the dimensionality of data by drawing several orthogonal (perpendicular) vectors in the feature space to represent the reduced number of dimensions. The variable number represents the number of dimensions the reduced data will have. In the case of visualization, for example, it would be two dimensions.


Understanding Principal Component Analysis - GreatLearning

#artificialintelligence

While working on different Machine Learning techniques for Data Analysis, we deal with hundreds or thousands of variables. Most of the variables are correlated with each other. Principal Component Analysis and Factor Analysis techniques are used to deal with such scenarios. Principal Component Analysis (PCA) is an unsupervised statistical technique algorithm. PCA is a "dimensionality reduction" method.


What is Principal Component Analysis in Machine Learning? Super Easy!

#artificialintelligence

Do you wanna know What is Principal Component Analysis?. If yes, then this blog is just for you. Here I will discuss What is Principal Component Analysis, its purpose, and How PCA works?. So, give your few minutes to this article in order to get all the details regarding Principal Component Analysis. Principal Component Analysis(PCA) is one of the best-unsupervised algorithms.


A Communication-Efficient Distributed Algorithm for Kernel Principal Component Analysis

arXiv.org Machine Learning

Principal Component Analysis (PCA) is a fundamental technology in machine learning. Nowadays many high-dimension large datasets are acquired in a distributed manner, which precludes the use of centralized PCA due to the high communication cost and privacy risk. Thus, many distributed PCA algorithms are proposed, most of which, however, focus on linear cases. To efficiently extract non-linear features, this brief proposes a communication-efficient distributed kernel PCA algorithm, where linear and RBF kernels are applied. The key is to estimate the global empirical kernel matrix from the eigenvectors of local kernel matrices. The approximate error of the estimators is theoretically analyzed for both linear and RBF kernels. The result suggests that when eigenvalues decay fast, which is common for RBF kernels, the proposed algorithm gives high quality results with low communication cost. Results of simulation experiments verify our theory analysis and experiments on GSE2187 dataset show the effectiveness of the proposed algorithm.


An overview of Principal Component Analysis

#artificialintelligence

This article will explain you what Principal Component Analysis (PCA) is, why we need it and how we use it. I will try to make it as simple as possible while avoiding hard examples or words which can cause a headache. A moment of honesty: to fully understand this article, a basic understanding of some linear algebra and statistics is essential. Let's say we have 10 variables in our dataset and let's assume that 3 variables capture 90% of the dataset, and 7 variables capture 10% of the dataset. Let's say we want to visualize 10 variables.


Distributed Estimation for Principal Component Analysis: a Gap-free Approach

arXiv.org Machine Learning

The growing size of modern data sets brings many challenges to the existing statistical estimation approaches, which calls for new distributed methodologies. This paper studies distributed estimation for a fundamental statistical machine learning problem, principal component analysis (PCA). Despite the massive literature on top eigenvector estimation, much less is presented for the top-$L$-dim ($L > 1$) eigenspace estimation, especially in a distributed manner. We propose a novel multi-round algorithm for constructing top-$L$-dim eigenspace for distributed data. Our algorithm takes advantage of shift-and-invert preconditioning and convex optimization. Our estimator is communication-efficient and achieves a fast convergence rate. In contrast to the existing divide-and-conquer algorithm, our approach has no restriction on the number of machines. Theoretically, we establish a gap-free error bound and abandon the assumption on the sharp eigengap between the $L$-th and the ($L+1$)-th eigenvalues. Our distributed algorithm can be applied to a wide range of statistical problems based on PCA. In particular, this paper illustrates two important applications, principal component regression and single index model, where our distributed algorithm can be extended. Finally, We provide simulation studies to demonstrate the performance of the proposed distributed estimator.


Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data

#artificialintelligence

Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data In this video, I will be showing you how to perform principal component analysis (PCA) in Python using the scikit-learn package. PCA represents a powerful learning approach that enables the analysis of high-dimensional data as well as reveal the contribution of descriptors in governing the distribution of data clusters. Particularly, we will be creating PCA scree plot, scores plot and loadings plot. This video is part of the [Python Data Science Project] series. If you're new here, it would mean the world to me if you would consider subscribing to this channel.


Sparse probabilistic projections

Neural Information Processing Systems

We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributions, such as generalised hyperbolic distributions. We derive a variational Expectation-Maximisation algorithm for the estimation of the hyperparameters and show that our novel probabilistic approach compares favourably to existing techniques. We illustrate how the proposed method can be applied in the context of cryptoanalysis as a pre-processing tool for the construction of template attacks. Papers published at the Neural Information Processing Systems Conference.