Principal Component Analysis
The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum
Williams, Christopher, Shawe-taylor, John S.
In this paper we analyze the relationships between the eigenvalues of the m x m Gram matrix K for a kernel k(ยท,.) We bound the dif between the two spectra and provide a performance bound on kernel peA. 1 Introduction Over recent years there has been a considerable amount of interest in kernel methods for supervised learning (e.g. Support Vector Machines and Gaussian Process predict ion) and for unsupervised learning (e.g. In this paper we study the stability of the subspace of feature space extracted by kernel peA with respect to the sample of size m, and relate this to the feature space that would be extracted in the infinite sample-size limit. This analysis essentially "lifts" into (a potentially infinite dimensional) feature space an analysis which can also be carried out for peA, comparing the k-dimensional eigenspace extracted from a sample covariance matrix and the k-dimensional eigenspace extracted from the population covariance matrix, and comparing the residuals from the k-dimensional compression for the m-sample and the population.
The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum
Williams, Christopher, Shawe-taylor, John S.
I. Williams School of Informatics University of Edinburgh c.k.i.williams ed.ac.uk Abstract In this paper we analyze the relationships between the eigenvalues of the m x m Gram matrix K for a kernel k(ยท, .) We bound the differences betweenthe two spectra and provide a performance bound on kernel peA. 1 Introduction Over recent years there has been a considerable amount of interest in kernel methods for supervised learning (e.g. Support Vector Machines and Gaussian Process predict ion)and for unsupervised learning (e.g. In this paper we study the stability of the subspace of feature space extracted by kernel peA with respect to the sample of size m, and relate this to the feature space that would be extracted in the infinite sample-size limit. This analysis essentially "lifts" into (a potentially infinite dimensional) feature space an analysis which can also be carried out for peA, comparing the k-dimensional eigenspace extracted from a sample covariance matrix and the k-dimensional eigenspace extracted from the population covariance matrix, and comparing the residuals from the k-dimensional compression for the m-sample and the population.
A Generalization of Principal Components Analysis to the Exponential Family
Collins, Michael, Dasgupta, S., Schapire, Robert E.
Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family,Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, andgive examples on simulated data.
Face Recognition Using Kernel Methods
Principal Component Analysis and Fisher Linear Discriminant methods have demonstrated their success in face detection, recognition, and tracking. The representation in these subspace methods is based on second order statistics of the image set, and does not address higher order statistical dependencies such as the relationships among three or more pixels. Recently Higher Order Statistics and Independent Component Analysis (ICA) have been used as informative low dimensional representations for visual recognition. In this paper, we investigate the use of Kernel Principal Component Analysis and Kernel Fisher Linear Discriminant for learning low dimensional representations for face recognition, which we call Kernel Eigenface and Kernel Fisherface methods. While Eigenface and Fisherface methods aim to find projection directions based on the second order correlation of samples, Kernel Eigenface and Kernel Fisherface methods provide generalizations which take higher order correlations into account.
A Generalization of Principal Components Analysis to the Exponential Family
Collins, Michael, Dasgupta, S., Schapire, Robert E.
Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data.
Sparse Kernel Principal Component Analysis
'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisation of the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation into a feature space wherein standard PCA is performed. Unfortunately, the technique is not'sparse', since the components thus obtained are expressed in terms of kernels associated with every training vector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of example vectors, using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness. 1 Introduction Principal component analysis (PCA) is a well-established technique for dimensionality reduction, and examples of its many applications include data compression, image processing, visualisation, exploratory data analysis, pattern recognition and time series prediction.
Sparse Kernel Principal Component Analysis
'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisationof the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation intoa feature space wherein standard PCA is performed. Unfortunately, thetechnique is not'sparse', since the components thus obtained are expressed in terms of kernels associated with every trainingvector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of example vectors,using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness. 1 Introduction Principal component analysis (PCA) is a well-established technique for dimensionality reduction,and examples of its many applications include data compression, image processing, visualisation, exploratory data analysis, pattern recognition and time series prediction.
Self-Organizing Rules for Robust Principal Component Analysis
Using statistical physicstechniques including the Gibbs distribution, binary decision fields and effective energies, we propose self-organizing PCA rules which are capable of resisting outliers while fulfilling various PCA-related tasks such as obtaining the first principal component vector,the first k principal component vectors, and directly finding the subspace spanned by the first k vector principal component vectorswithout solving for each vector individually. Comparative experimentshave shown that the proposed robust rules improve the performances of the existing PCA algorithms significantly whenoutliers are present.
Self-Organizing Rules for Robust Principal Component Analysis
Principal Component Analysis (PCA) is an essential technique for data compression and feature extraction, and has been widely used in statistical data analysis, communication theory, pattern recognition and image processing. In the neural network literature, a lot of studies have been made on learning rules for implementing PCA or on networks closely related to PCA (see Xu & Yuille, 1993 for a detailed reference list which contains more than 30 papers related to these issues).
Node Splitting: A Constructive Algorithm for Feed-Forward Neural Networks
A constructive algorithm is proposed for feed-forward neural networks, which uses node-splitting in the hidden layers to build large networks from smaller ones. The small network forms an approximate model of a set of training data, and the split creates a larger more powerful network which is initialised with the approximate solution already found. The insufficiency of the smaller network in modelling the system which generated the data leads to oscillation in those hidden nodes whose weight vectors cover regions in the input space where more detail is required in the model. These nodes are identified and split in two using principal component analysis, allowing the new nodes t.o cover the two main modes of each oscillating vector. Nodes are selected for splitting using principal component analysis on the oscillating weight vectors, or by examining the Hessian matrix of second derivatives of the network error with respect to the weight.s.