Principal Component Analysis
Global Weighted Tensor Nuclear Norm for Tensor Robust Principal Component Analysis
Wang, Libin, Wang, Yulong, Wang, Shiyuan, Liu, Youheng, Hu, Yutao, Chen, Longlong, Chen, Hong
Tensor Robust Principal Component Analysis (TRPCA), which aims to recover a low-rank tensor corrupted by sparse noise, has attracted much attention in many real applications. This paper develops a new Global Weighted TRPCA method (GWTRPCA), which is the first approach simultaneously considers the significance of intra-frontal slice and inter-frontal slice singular values in the Fourier domain. Exploiting this global information, GWTRPCA penalizes the larger singular values less and assigns smaller weights to them. Hence, our method can recover the low-tubal-rank components more exactly. Moreover, we propose an effective adaptive weight learning strategy by a Modified Cauchy Estimator (MCE) since the weight setting plays a crucial role in the success of GWTRPCA. To implement the GWTRPCA method, we devise an optimization algorithm using an Alternating Direction Method of Multipliers (ADMM) method. Experiments on real-world datasets validate the effectiveness of our proposed method.
Unsupervised High Impedance Fault Detection Using Autoencoder and Principal Component Analysis
Liu, Yingxiang, Razeghi-Jahromi, Mohammad, Stoupis, James
Detection of high impedance faults (HIF) has been one of the biggest challenges in the power distribution network. The low current magnitude and diverse characteristics of HIFs make them difficult to be detected by over-current relays. Recently, data-driven methods based on machine learning models are gaining popularity in HIF detection due to their capability to learn complex patterns from data. Most machine learning-based detection methods adopt supervised learning techniques to distinguish HIFs from normal load conditions by performing classifications, which rely on a large amount of data collected during HIF. However, measurements of HIF are difficult to acquire in the real world. As a result, the reliability and generalization of the classification methods are limited when the load profiles and faults are not present in the training data. Consequently, this paper proposes an unsupervised HIF detection framework using the autoencoder and principal component analysis-based monitoring techniques. The proposed fault detection method detects the HIF by monitoring the changes in correlation structure within the current waveforms that are different from the normal loads. The performance of the proposed HIF detection method is tested using real data collected from a 4.16 kV distribution system and compared with results from a commercially available solution for HIF detection. The numerical results demonstrate that the proposed method outperforms the commercially available HIF detection technique while maintaining high security by not falsely detecting during load conditions.
A Beginner's Guide to Principal Component Analysis
In principal component analysis, a principal component is a new feature that is constructed from a linear combination of the original features in a dataset. The principal components are ordered such that the first principal component has the highest possible variance (i.e., the greatest amount of spread or dispersion in the data), and each subsequent component in turn has the highest variance possible under the constraint that it is orthogonal (i.e., uncorrelated) to the previous components. The idea behind PCA is to reduce the dimensionality of a dataset by projecting the data onto a lower-dimensional space, while still preserving as much of the variance in the data as possible. This is done by selecting a smaller number of principal components that capture the most important information in the data, and discarding the remaining, less important components. In this way, PCA can be used to identify patterns and relationships in high-dimensional data, and to visualize data in a lower-dimensional space for easier interpretation.
Quasi-parametric rates for Sparse Multivariate Functional Principal Components Analysis
This work aims to give non-asymptotic results for estimating the first principal component of a multivariate random process. We first define the covariance function and the covariance operator in the multivariate case. We then define a projection operator. This operator can be seen as a reconstruction step from the raw data in the functional data analysis context. Next, we show that the eigenelements can be expressed as the solution to an optimization problem, and we introduce the LASSO variant of this optimization problem and the associated plugin estimator. Finally, we assess the estimator's accuracy. We establish a minimax lower bound on the mean square reconstruction error of the eigenelement, which proves that the procedure has an optimal variance in the minimax sense.
Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data
Mirkes, Evgeny M, Bac, Jonathan, Fouché, Aziz, Stasenko, Sergey V., Zinovyev, Andrei, Gorban, Alexander N.
Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing the single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.
Principal Component Analysis for Dimensionality Reduction in Python - MachineLearningMastery.com Principal Component Analysis for Dimensionality Reduction in Python - MachineLearningMastery.com
Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. Fewer input variables can result in a simpler predictive model that may have better performance when making predictions on new data. Perhaps the most popular technique for dimensionality reduction in machine learning is Principal Component Analysis, or PCA for short. This is a technique that comes from the field of linear algebra and can be used as a data preparation technique to create a projection of a dataset prior to fitting a model. In this tutorial, you will discover how to use PCA for dimensionality reduction when developing predictive models.
Intermediate Machine Learning: Principal Component Analysis (PCA) - PythonAlgos
Welcome to the third module in our Machine Learning series. So far we've covered Linear Regression and Logistic Regression. Just to recap, Linear Regression is the simplest implementation of continuous prediction (i.e. Now let's get into something a little more complex – Principal Component Analysis (PCA) in Python. PCA is a dimensionality reduction technique.
Principal Component Analysis (PCA) for Machine Learning
Sometimes there are situations where the dataset contains many features or as we call it High Dimensionality. High dimensionality datasets can cause a lot of issues, the most common issue that occurs is Overfitting, which means the model is not able to generalize beyond the training data. Therefore, we have to employ a special technique called Dimensionality Reduction Technique to deal with this High Dimensionality. One of the best techniques that we use is known as Principal Component Analysis(PCA). Principal Component Analysis is one of the best Dimensionality Reduction Techniques available in Machine Learning.
High Dimensional Bayesian Optimization with Kernel Principal Component Analysis
Antonov, Kirill, Raponi, Elena, Wang, Hao, Doerr, Carola
Bayesian Optimization (BO) is a surrogate-based global optimization strategy that relies on a Gaussian Process regression (GPR) model to approximate the objective function and an acquisition function to suggest candidate points. It is well-known that BO does not scale well for high-dimensional problems because the GPR model requires substantially more data points to achieve sufficient accuracy and acquisition optimization becomes computationally expensive in high dimensions. Several recent works aim at addressing these issues, e.g., methods that implement online variable selection or conduct the search on a lower-dimensional sub-manifold of the original search space. Advancing our previous work of PCA-BO that learns a linear sub-manifold, this paper proposes a novel kernel PCA-assisted BO (KPCA-BO) algorithm, which embeds a non-linear sub-manifold in the search space and performs BO on this sub-manifold. Intuitively, constructing the GPR model on a lower-dimensional sub-manifold helps improve the modeling accuracy without requiring much more data from the objective function. Also, our approach defines the acquisition function on the lower-dimensional sub-manifold, making the acquisition optimization more manageable. We compare the performance of KPCA-BO to a vanilla BO and to PCA-BO on the multi-modal problems of the COCO/BBOB benchmark suite. Empirical results show that KPCA-BO outperforms BO in terms of convergence speed on most test problems, and this benefit becomes more significant when the dimensionality increases. For the 60D functions, KPCA-BO achieves better results than PCA-BO for many test cases. Compared to the vanilla BO, it efficiently reduces the CPU time required to train the GPR model and to optimize the acquisition function compared to the vanilla BO.
Riemannian CUR Decompositions for Robust Principal Component Analysis
Hamm, Keaton, Meskini, Mohamed, Cai, HanQin
Robust Principal Component Analysis (PCA) has received massive attention in recent years. It aims to recover a low-rank matrix and a sparse matrix from their sum. This paper proposes a novel nonconvex Robust PCA algorithm, coined Riemannian CUR (RieCUR), which utilizes the ideas of Riemannian optimization and robust CUR decompositions. This algorithm has the same computational complexity as Iterated Robust CUR, which is currently state-of-the-art, but is more robust to outliers. RieCUR is also able to tolerate a significant amount of outliers, and is comparable to Accelerated Alternating Projections, which has high outlier tolerance but worse computational complexity than the proposed method. Thus, the proposed algorithm achieves state-of-the-art performance on Robust PCA both in terms of computational complexity and outlier tolerance.