AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.41)

#artificialintelligenceApr-21-2021, 05:15:08 GMT

Principal Component Analysis

"Machine intelligence is the last invention that humanity will ever need to make". The quote definitely makes it clear that machine learning is the future and vast opportunities and benefits for all. Let this be a fresh start for you to learn a really interesting algorithm in machine learning. As you all know, we often come across the problems of storing and processing huge data in machine learning tasks, as it's a time-consuming process and difficulties to interpret also arises. Not every feature of the data is necessary for predictions.

artificial intelligence, machine learning, principal components, (15 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.42)

Paul, Debolina, Chakraborty, Saptarshi, Das, Swagatam

Robust Principal Component Analysis: A Median of Means Approach

arXiv.org Machine LearningFeb-5-2021

Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well known to fall prey to the presence of outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Recent supervised learning methods, following the Median of Means (MoM) philosophy, have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. In this paper, we propose a PCA procedure based on the MoM principle. Called the Median of Means Principal Component Analysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of Vapnik-Chervonenkis theory and Rademacher complexity, while granting absolutely no assumption on the outlying observations. The efficacy of the proposal is also thoroughly showcased through simulations and real data applications.

artificial intelligence, machine learning, mompca, (15 more...)

2102.03403

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.82)

#artificialintelligenceJan-31-2021, 23:10:12 GMT

Principal Component Analysis (PCA) with Python Examples -- Tutorial

When implementing machine learning algorithms, the inclusion of more features might lead to worsening performance issues. Increasing the number of features will not always improve classification accuracy, which is also known as the curse of dimensionality. Hence, we apply dimensionality reduction to improve classification accuracy by selecting the optimal set of lower dimensionality features. Principal component analysis (PCA) is essential for data science, machine learning, data visualization, statistics, and other quantitative fields. It is essential to know about vector, matrix, and transpose matrix, eigenvalues, eigenvectors, and others to understand the concept of dimensionality reduction.

artificial intelligence, covariance matrix, machine learning, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.64)

arXiv.org Machine LearningJan-30-2021

Spike and slab Bayesian sparse principal component analysis

Ning, Bo

Sparse principal component analysis (PCA) is a popular tool for dimensional reduction of high-dimensional data. Despite its massive popularity, there is still a lack of theoretically justifiable Bayesian sparse PCA that is computationally scalable. A major challenge is choosing a suitable prior for the loadings matrix, as principal components are mutually orthogonal. We propose a spike and slab prior that meets this orthogonality constraint and show that the posterior enjoys both theoretical and computational advantages. Two computational algorithms, the PX-CAVI and the PX-EM algorithms, are developed. Both algorithms use parameter expansion to deal with the orthogonality constraint and to accelerate their convergence speeds. We found that the PX-CAVI algorithm has superior empirical performance than the PX-EM algorithm and two other penalty methods for sparse PCA. The PX-CAVI algorithm is then applied to study a lung cancer gene expression dataset. $\mathsf{R}$ package $\mathsf{VBsparsePCA}$ with an implementation of the algorithm is available on The Comprehensive R Archive Network.

algorithm, bayesian inference, oncology, (19 more...)

2102.00305

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Yao, Yunzhen, Peng, Liangzu, Tsakiris, Manolis C.

Unlabeled Principal Component Analysis

arXiv.org Machine LearningJan-23-2021

We consider the problem of principal component analysis from a data matrix where the entries of each column have undergone some unknown permutation, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that for generic enough data, and up to a permutation of the coordinates of the ambient space, there is a unique subspace of minimal dimension that explains the data. We show that a permutation-invariant system of polynomial equations has finitely many solutions, with each solution corresponding to a row permutation of the ground-truth data matrix. Allowing for missing entries on top of permutations leads to the problem of unlabeled matrix completion, for which we give theoretical results of similar flavor. We also propose a two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data has been permuted. Stage-I of this pipeline employs robust-PCA methods to estimate the ground-truth column-space. Equipped with the column-space, stage-II applies methods for linear regression without correspondences to restore the permuted data. A computational study reveals encouraging findings, including the ability of UPCA to handle face images from the Extended Yale-B database with arbitrarily permuted patches of arbitrary size in $0.3$ seconds on a standard desktop computer.

artificial intelligence, machine learning, permutation, (16 more...)

2101.09446

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.82)

Gang, Arpita, Bajwa, Waheed U.

A Linearly Convergent Algorithm for Distributed Principal Component Analysis

arXiv.org Machine LearningJan-4-2021

Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. This paper focuses on this dual objective of PCA, namely, dimensionality reduction and decorrelation of features, which requires estimating the eigenvectors of a data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors. The ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or communications overhead of these solutions remain a concern. With an eye towards communications efficiency, this paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the eigenvectors of a data covariance matrix when data are distributed across an undirected and arbitrarily connected network of machines. Furthermore, the proposed algorithm is shown to converge linearly to a neighborhood of the true solution. Numerical results are also shown to demonstrate the efficacy of the proposed solution.

artificial intelligence, eigenvector, survey article, (18 more...)

2101.013

Country: North America > United States (0.92)

Genre: Research Report (0.81)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)

Ghojogh, Benyamin, Ghodsi, Ali, Karray, Fakhri, Crowley, Mark

Factor Analysis, Probabilistic Principal Component Analysis, Variational Inference, and Variational Autoencoder: Tutorial and Survey

arXiv.org Machine LearningJan-3-2021

This is a tutorial and survey paper on factor analysis, probabilistic Principal Component Analysis (PCA), variational inference, and Variational Autoencoder (VAE). These methods, which are tightly related, are dimensionality reduction and generative models. They asssume that every data point is generated from or caused by a low-dimensional latent factor. By learning the parameters of distribution of latent space, the corresponding low-dimensional factors are found for the sake of dimensionality reduction. For their stochastic and generative behaviour, these models can also be used for generation of new data points in the data space. In this paper, we first start with variational inference where we derive the Evidence Lower Bound (ELBO) and Expectation Maximization (EM) for learning the parameters. Then, we introduce factor analysis, derive its joint and marginal distributions, and work out its EM steps. Probabilistic PCA is then explained, as a special case of factor analysis, and its closed-form solutions are derived. Finally, VAE is explained where the encoder, decoder and sampling from the latent space are introduced. Training VAE using both EM and backpropagation are explained.

artificial intelligence, neural network, variational autoencoder, (13 more...)

2101.00734

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

#artificialintelligenceDec-15-2020, 06:10:17 GMT

Principal Component Analysis (PCA) vs Lineare Regression – Fly spaceships with your mind

artificial intelligence, machine learning, principal component analysis, (7 more...)

Industry:

Government > Military > Air Force (0.40)
Aerospace & Defense (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.43)

arXiv.org Machine LearningNov-22-2020

Angular Embedding: A New Angular Robust Principal Component Analysis

Liu, Shenglan, Yu, Yang

As a widely used method in machine learning, principal component analysis (PCA) shows excellent properties for dimensionality reduction. It is a serious problem that PCA is sensitive to outliers, which has been improved by numerous Robust PCA (RPCA) versions. However, the existing state-of-the-art RPCA approaches cannot easily remove or tolerate outliers by a non-iterative manner. To tackle this issue, this paper proposes Angular Embedding (AE) to formulate a straightforward RPCA approach based on angular density, which is improved for large scale or high-dimensional data. Furthermore, a trimmed AE (TAE) is introduced to deal with data with large scale outliers. Extensive experiments on both synthetic and real-world datasets with vector-level or pixel-level outliers demonstrate that the proposed AE/TAE outperforms the state-of-the-art RPCA based methods.

artificial intelligence, health & medicine, outlier, (18 more...)

2011.11013

Country: Asia > China > Liaoning Province (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)