Lippi, Vittorio, Ceccarelli, Giacomo

This paper describes some applications of an incremental implementation of the principal component analysis (PCA). The algorithm updates the transformation coefficients matrix on-line for each new sample, without the need to keep all the samples in memory. The algorithm is formally equivalent to the usual batch version, in the sense that given a sample set the transformation coefficients at the end of the process are the same. The implications of applying the PCA in real time are discussed with the help of data analysis examples. In particular we focus on the problem of the continuity of the PCs during an on-line analysis.

Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with shared common principal components across matrices and individual principal components specific to each data matrix.

'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisationof the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation intoa feature space wherein standard PCA is performed. Unfortunately, thetechnique is not'sparse', since the components thus obtained are expressed in terms of kernels associated with every trainingvector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of example vectors,using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness. 1 Introduction Principal component analysis (PCA) is a well-established technique for dimensionality reduction,and examples of its many applications include data compression, image processing, visualisation, exploratory data analysis, pattern recognition and time series prediction.

In a previous post I summarized the tasks and procedures available in SAS Viya Data Mining and Machine Learning. In this post, I'll dive into the unsupervised learning category which currently hosts several tasks: Kmeans, Kmodes, and Kprototypes Clustering, Outlier Detection, and a few variants of Principal Component Analysis. In unsupervised learning there are no known labels (outcomes), only attributes (inputs). Examples include clustering, association, and segmentation. Machine learning finds high density areas (in multidimensional space) that are more or less similar to each other, and identifies structures in the data that separate these areas.