AITopics | Principal Component Analysis

Collaborating Authors

Principal Component Analysis

News Overviews Instructional Materials AI-Alerts Classics

Nonlinear Principal Component Analysis with Random Bernoulli Features for Process Monitoring

arXiv.org Machine LearningMar-16-2025

The process generates substantial amounts of data with highly complex structures, leading to the development of numerous nonlinear statistical methods. However, most of these methods rely on computations involving large-scale dense kernel matrices. This dependence poses significant challenges in meeting the high computational demands and real-time responsiveness required by online monitoring systems. To alleviate the computational burden of dense large-scale matrix multiplication, we incorporate the bootstrap sampling concept into random feature mapping and propose a novel random Bernoulli principal component analysis method to efficiently capture nonlinear patterns in the process. We derive a convergence bound for the kernel matrix approximation constructed using random Bernoulli features, ensuring theoretical robustness. Subsequently, we design four fast process monitoring methods based on random Bernoulli principal component analysis to extend its nonlinear capabilities for handling diverse fault scenarios. Finally, numerical experiments and real-world data analyses are conducted to evaluate the performance of the proposed methods. Results demonstrate that the proposed methods offer excellent scalability and reduced computational complexity, achieving substantial cost savings with minimal performance loss compared to traditional kernel-based approaches.

artificial intelligence, machine learning, nonlinear principal component analysis, (2 more...)

arXiv.org Machine Learning

2503.12456

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.80)
Information Technology > Data Science (0.53)

Add feedback

Robust Multilinear Principal Component Analysis

Hirari, Mehdi, Centofanti, Fabio, Hubert, Mia, Van Aelst, Stefan

arXiv.org Machine LearningMar-10-2025

Multilinear Principal Component Analysis (MPCA) is an important tool for analyzing tensor data. It performs dimension reduction similar to PCA for multivariate data. However, standard MPCA is sensitive to outliers. It is highly influenced by observations deviating from the bulk of the data, called casewise outliers, as well as by individual outlying cells in the tensors, so-called cellwise outliers. This latter type of outlier is highly likely to occur in tensor data, as tensors typically consist of many cells. This paper introduces a novel robust MPCA method that can handle both types of outliers simultaneously, and can cope with missing values as well. This method uses a single loss function to reduce the influence of both casewise and cellwise outliers. The solution that minimizes this loss function is computed using an iteratively reweighted least squares algorithm with a robust initialization. Graphical diagnostic tools are also proposed to identify the different types of outliers that have been found by the new robust MPCA method. The performance of the method and associated graphical displays is assessed through simulations and illustrated on two real datasets.

artificial intelligence, machine learning, tensor, (18 more...)

arXiv.org Machine Learning

2503.07327

Country:

North America > United States (0.14)
Europe > Italy (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback

Provable Model-Parallel Distributed Principal Component Analysis with Parallel Deflation

Liao, Fangshuo, Su, Wenyi, Kyrillidis, Anastasios

arXiv.org Artificial IntelligenceFeb-24-2025

We study a distributed Principal Component Analysis (PCA) framework where each worker targets a distinct eigenvector and refines its solution by updating from intermediate solutions provided by peers deemed as "superior". Drawing intuition from the deflation method in centralized eigenvalue problems, our approach breaks the sequential dependency in the deflation steps and allows asynchronous updates of workers, while incurring only a small communication cost. To our knowledge, a gap in the literature -- the theoretical underpinning of such distributed, dynamic interactions among workers -- has remained unaddressed. This paper offers a theoretical analysis explaining why, how, and when these intermediate, hierarchical updates lead to practical and provable convergence in distributed environments. Despite being a theoretical work, our prototype implementation demonstrates that such a distributed PCA algorithm converges effectively and in scalable way: through experiments, our proposed framework offers comparable performance to EigenGame-$\mu$, the state-of-the-art model-parallel PCA solver.

artificial intelligence, eigenvector, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.17615

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Banking & Finance > Economy (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback

Rethinking Diverse Human Preference Learning through Principal Component Analysis

Luo, Feng, Yang, Rui, Sun, Hao, Deng, Chunyuan, Yao, Jiarui, Shen, Jingyan, Zhang, Huan, Chen, Hanjie

arXiv.org Artificial IntelligenceFeb-18-2025

Understanding human preferences is crucial for improving foundation models and building personalized AI systems. However, preferences are inherently diverse and complex, making it difficult for traditional reward models to capture their full range. While fine-grained preference data can help, collecting it is expensive and hard to scale. In this paper, we introduce Decomposed Reward Models (DRMs), a novel approach that extracts diverse human preferences from binary comparisons without requiring fine-grained annotations. Our key insight is to represent human preferences as vectors and analyze them using Principal Component Analysis (PCA). By constructing a dataset of embedding differences between preferred and rejected responses, DRMs identify orthogonal basis vectors that capture distinct aspects of preference. These decomposed rewards can be flexibly combined to align with different user needs, offering an interpretable and scalable alternative to traditional reward models. We demonstrate that DRMs effectively extract meaningful preference dimensions (e.g., helpfulness, safety, humor) and adapt to new users without additional training. Our results highlight DRMs as a powerful framework for personalized and interpretable LLM alignment.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.13131

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)

Add feedback

Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated

Neural Information Processing SystemsFeb-11-2025, 20:34:55 GMT

Given a matrix of observed data, Principal Components Analysis (PCA) computes a small number of orthogonal directions that contain most of its variability. Provably accurate solutions for PCA have been in use for a long time. However, to the best of our knowledge, all existing theoretical guarantees for it assume that the data and the corrupting noise are mutually independent, or at least uncorrelated. This is valid in practice often, but not always. In this paper, we study the PCA problem in the setting where the data and noise can be correlated.

artificial intelligence, machine learning, principal component analysis, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.30)

Add feedback

Optimal Sparse Linear Encoders and Sparse PCA

Neural Information Processing SystemsFeb-11-2025, 18:35:11 GMT

Principal components analysis (PCA) is the optimal linear encoder of data. Sparse linear encoders (e.g., sparse PCA) produce more interpretable features that can promote better generalization. We answer both questions by providing the first polynomial-time algorithms to construct \emph{optimal} sparse linear auto-encoders; additionally, we demonstrate the performance of our algorithms on real data.

artificial intelligence, machine learning, optimal sparse linear encoder, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.34)

Add feedback

Unlabeled Principal Component Analysis

Neural Information Processing SystemsFeb-11-2025, 04:59:48 GMT

We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that UPCA is a well-defined algebraic problem in the sense that the only matrices of minimal rank that agree with the given data are row-permutations of the ground-truth matrix, arising as the unique solutions of a polynomial system of equations. Further, we propose an efficient two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data have been permuted. Stage-I employs outlier-robust PCA methods to estimate the ground-truth column-space. Equipped with the column-space, Stage-II applies recent methods for unlabeled sensing to restore the permuted data. Experiments on synthetic data, face images, educational and medical records reveal the potential of UPCA for applications such as data privatization and record linkage.

artificial intelligence, machine learning, permutation, (16 more...)

Neural Information Processing Systems

Industry:

Health & Medicine (0.88)
Information Technology > Security & Privacy (0.68)
Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.81)

Add feedback

Quasicyclic Principal Component Analysis

Rumsey, Susanna E., Draper, Stark C., Kschischang, Frank R.

arXiv.org Machine LearningFeb-7-2025

We present quasicyclic principal component analysis (QPCA), a generalization of principal component analysis (PCA), that determines an optimized basis for a dataset in terms of families of shift-orthogonal principal vectors. This is of particular interest when analyzing cyclostationary data, whose cyclic structure is not exploited by the standard PCA algorithm. We first formulate QPCA as an optimization problem, which we show may be decomposed into a series of PCA problems in the frequency domain. We then formalize our solution as an explicit algorithm and analyze its computational complexity. Finally, we provide some examples of applications of QPCA to cyclostationary signal processing data, including an investigation of carrier pulse recovery, a presentation of methods for estimating an unknown oversampling rate, and a discussion of an appropriate approach for pre-processing data with a non-integer oversampling rate in order to better apply the QPCA algorithm.

artificial intelligence, machine learning, vector, (14 more...)

arXiv.org Machine Learning

2502.05297

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.82)

Add feedback

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Neural Information Processing SystemsJan-27-2025, 18:40:27 GMT

Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption.

artificial intelligence, machine learning, principal component analysis, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.92)
North America > United States > California (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.40)

Add feedback

Review for NeurIPS paper: Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Neural Information Processing SystemsJan-27-2025, 18:40:21 GMT

Summary and Contributions: This paper studies the problem of Robust PCA of a subgaussian distribution. Specifically, one is given samples X_1,X_2,...,X_n from a subgaussian distribution, such that an eps-fraction of the samples have been arbitrarily corruption (modified by an adversary), the goal is to approximately recover the top singular vector of the covariance matrix Sigma. Here, approximately recover means to find a vector u such that u T Sigma u (1 - gamma) Sigma _op, where Sigma _op is the operator norm of Sigma. The main result of this paper are runtime and sample complexity efficient algorithms for this task. Specifically, they show 1) An algorithm that achieves error gamma O(eps log(1/eps)) in polynomial time, specifically in tilde{O}(n d 2/eps) time, using n Omega(d /*(eps log(1/eps)) 2) samples.

algorithm, analysis and width-independent schatten packing, robust sub-gaussian principal component analysis, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.40)

Add feedback