AITopics | Principal Component Analysis

Collaborating Authors

Principal Component Analysis

News Overviews Instructional Materials AI-Alerts Classics

ReFACTor: Practical Low-Rank Matrix Estimation Under Column-Sparsity

Gavish, Matan, Schweiger, Regev, Rahmani, Elior, Halperin, Eran

arXiv.org Machine LearningMay-22-2017

Various problems in data analysis and statistical genetics call for recovery of a column-sparse, low-rank matrix from noisy observations. We propose ReFACTor, a simple variation of the classical Truncated Singular Value Decomposition (TSVD) algorithm. In contrast to previous sparse principal component analysis (PCA) algorithms, our algorithm can provably reveal a low-rank signal matrix better, and often significantly better, than the widely used TSVD, making it the algorithm of choice whenever column-sparsity is suspected. Empirically, we observe that ReFACTor consistently outperforms TSVD even when the underlying signal is not sparse, suggesting that it is generally safe to use ReFACTor instead of TSVD and PCA. The algorithm is extremely simple to implement and its running time is dominated by the runtime of PCA, making it as practical as standard principal component analysis.

artificial intelligence, health & medicine, refactor, (16 more...)

arXiv.org Machine Learning

1705.07654

Country:

Asia > Middle East > Israel (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > Experimental Study (0.47)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.45)

Add feedback

Introduction to Principal Component Analysis

#artificialintelligenceApr-27-2017, 21:01:12 GMT

This formula-free summary provides a short overview about how PCA (principal component analysis) works for dimension reduction, that is, to select k features (also called variables) among a larger set of n features, with k much smaller than n. This smaller set of k features built with PCA is the best subset of k features, in the sense that it minimizes the variance of the residual noise when fitting data to a linear model. Note that PCA transforms the initial features into new ones, that are linear combinations of the original features.

artificial intelligence, machine learning, principal component analysis

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.71)

Add feedback

Introduction to Principal Component Analysis

@machinelearnbotApr-26-2017, 19:32:31 GMT

Here is a short overview about how PCA (principal component analysis) works for dimension reduction, that is, to select k features (also called variables) among a larger set of n features, with k much smaller than n. This smaller set of k features built with PCA is the best subset of k features, in the sense that it minimizes the variance of the residual noise when fitting data to a linear model. Note that PCA transforms the initial features into new ones, that are linear combinations of the original features.

artificial intelligence, machine learning, principal component analysis

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.71)

Add feedback

Introduction to Principal Component Analysis

@machinelearnbotApr-8-2017, 19:46:24 GMT

The sheer size of data in the modern age is not only a challenge for computer hardware but also the main bottleneck for the performance of many machine learning algorithms. The main goal of a PCA analysis is to identify patterns in data. PCA aims to detect the correlation between variables. If a strong correlation between variables exists, the attempt to reduce the dimensionality only makes sense. It is a statistical method used to reduce the number of variables in a data-set.

artificial intelligence, eigenvector, machine learning, (14 more...)

@machinelearnbot

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.40)

Add feedback

Unsupervised Learning in SAS Visual Data Mining and Machine Learning

#artificialintelligenceMar-23-2017, 19:50:33 GMT

In a previous post I summarized the tasks and procedures available in SAS Viya Data Mining and Machine Learning. In this post, I'll dive into the unsupervised learning category which currently hosts several tasks: Kmeans, Kmodes, and Kprototypes Clustering, Outlier Detection, and a few variants of Principal Component Analysis. In unsupervised learning there are no known labels (outcomes), only attributes (inputs). Examples include clustering, association, and segmentation. Machine learning finds high density areas (in multidimensional space) that are more or less similar to each other, and identifies structures in the data that separate these areas.

artificial intelligence, data mining, principal components, (13 more...)

#artificialintelligence

Industry: Government (0.32)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.33)

Add feedback

Unsupervised learning of phase transitions: from principal component analysis to variational autoencoders

Wetzel, Sebastian Johann

arXiv.org Machine LearningMar-12-2017

We employ unsupervised machine learning techniques to learn latent parameters which best describe states of the two-dimensional Ising model and the three-dimensional XY model. These methods range from principal component analysis to artificial neural network based variational autoencoders. The states are sampled using a Monte-Carlo simulation above and below the critical temperature. We find that the predicted latent parameters correspond to the known order parameters. The latent representation of the states of the models in question are clustered, which makes it possible to identify phases without prior knowledge of their existence or the underlying Hamiltonian. Furthermore, we find that the reconstruction loss function can be used as a universal identifier for phase transitions.

artificial intelligence, autoencoder, neural network, (18 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevE.96.022140

1703.02435

Country: Europe > Germany (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.64)

Add feedback

Maximally Correlated Principal Component Analysis

Feizi, Soheil, Tse, David

arXiv.org Machine LearningFeb-21-2017

In the era of big data, reducing data dimensionality is critical in many areas of science. Widely used Principal Component Analysis (PCA) addresses this problem by computing a low dimensional data embedding that maximally explain variance of the data. However, PCA has two major weaknesses. Firstly, it only considers linear correlations among variables (features), and secondly it is not suitable for categorical data. We resolve these issues by proposing Maximally Correlated Principal Component Analysis (MCPCA). MCPCA computes transformations of variables whose covariance matrix has the largest Ky Fan norm. Variable transformations are unknown, can be nonlinear and are computed in an optimization. MCPCA can also be viewed as a multivariate extension of Maximal Correlation. For jointly Gaussian variables we show that the covariance matrix corresponding to the identity (or the negative of the identity) transformations majorizes covariance matrices of non-identity functions. Using this result we characterize global MCPCA optimizers for nonlinear functions of jointly Gaussian variables for every rank constraint. For categorical variables we characterize global MCPCA optimizers for the rank one constraint based on the leading eigenvector of a matrix computed using pairwise joint distributions. For a general rank constraint we propose a block coordinate descend algorithm and show its convergence to stationary points of the MCPCA optimization. We compare MCPCA with PCA and other state-of-the-art dimensionality reduction methods including Isomap, LLE, multilayer autoencoders (neural networks), kernel PCA, probabilistic PCA and diffusion maps on several synthetic and real datasets. We show that MCPCA consistently provides improved performance compared to other methods.

health & medicine, oncology, optimization, (14 more...)

arXiv.org Machine Learning

1702.05471

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.81)

Add feedback

Policy Search with High-Dimensional Context Variables

Tangkaratt, Voot (The University of Tokyo) | Hoof, Herke van (McGill University) | Parisi, Simone (Technical University of Darmstadt) | Neumann, Gerhard (University of Lincoln) | Peters, Jan (Max Planck Institute for Intelligent Systems) | Sugiyama, Masashi (The University of Tokyo)

AAAI ConferencesFeb-14-2017

Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored. In this paper, we propose a contextual policy search method in the model-based relative entropy stochastic search framework with integrated dimensionality reduction. We learn a model of the reward that is locally quadratic in both the policy parameters and the context variables. Furthermore, we perform supervised linear dimensionality reduction on the context variables by nuclear norm regularization. The experimental results show that the proposed method outperforms naive dimensionality reduction via principal component analysis and a state-of-the-art contextual policy search method.

artificial intelligence, dimensionality reduction, machine learning, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America (0.93)
Europe > Germany (0.30)
Asia > Japan > Honshū > Kantō (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.45)

Add feedback

Towards multiple kernel principal component analysis for integrative analysis of tumor samples

Speicher, Nora K., Pfeifer, Nico

arXiv.org Machine LearningJan-3-2017

Personalized treatment of patients based on tissue-specific cancer subtypes has strongly increased the efficacy of the chosen therapies. Even though the amount of data measured for cancer patients has increased over the last years, most cancer subtypes are still diagnosed based on individual data sources (e.g. gene expression data). We propose an unsupervised data integration method based on kernel principal component analysis. Principal component analysis is one of the most widely used techniques in data analysis. Unfortunately, the straight-forward multiple-kernel extension of this method leads to the use of only one of the input matrices, which does not fit the goal of gaining information from all data sources. Therefore, we present a scoring function to determine the impact of each input matrix. The approach enables visualizing the integrated data and subsequent clustering for cancer subtype identification. Due to the nature of the method, no free parameters have to be set. We apply the methodology to five different cancer data sets and demonstrate its advantages in terms of results and usability.

health & medicine, matrix, oncology, (20 more...)

arXiv.org Machine Learning

doi: 10.1515/jib-2017-0019

1701.00422

Country: Europe > Germany > Saarland (0.15)

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.83)

Add feedback

Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated

Vaswani, Namrata, Guo, Han

Neural Information Processing SystemsDec-31-2016

Given a matrix of observed data, Principal Components Analysis (PCA) computes a small number of orthogonal directions that contain most of its variability. Provably accurate solutions for PCA have been in use for a long time. However, to the best of our knowledge, all existing theoretical guarantees for it assume that the data and the corrupting noise are mutually independent, or at least uncorrelated. This is valid in practice often, but not always. In this paper, we study the PCA problem in the setting where the data and noise can be correlated. Such noise is often also referred to as ``data-dependent noise". We obtain a correctness result for the standard eigenvalue decomposition (EVD) based solution to PCA under simple assumptions on the data-noise correlation. We also develop and analyze a generalization of EVD, cluster-EVD, that improves upon EVD in certain regimes.

artificial intelligence, assumption, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Iowa (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback