Principal Component Analysis
Unsupervised Learning by Convex and Conic Coding
Unsupervised learning algorithms based on convex and conic en(cid:173) coders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both al(cid:173) gorithms are used to model handwritten digits and compared with vector quantization and principal component analysis.
EM Algorithms for PCA and SPCA
I present an expectation-maximization (EM) algorithm for principal component analysis (PCA). The algorithm allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data. It is computationally very efficient in space and time. I also introduce a new variant of PC A called sensible principal component analysis (SPCA) which de(cid:173) fines a proper density model in the data space. Learning for SPCA is also done with an EM algorithm.
Learning Generative Models with the Up Propagation Algorithm
Up- propagation is an algorithm for inverting and learning neural network generative models Sensory input is processed by inverting a model that generates patterns from hidden variables using top down connections The inversion process is iterative utilizing a negative feedback loop that depends on an error signal propagated by bottom up connections The error signal is also used to learn the generative model from examples The algorithm is benchmarked against principal component analysis in experiments on images of handwritten digits .
Robust Learning of Chaotic Attractors
A fundamental problem with the modeling of chaotic time series data is that minimizing short-term prediction errors does not guarantee a match between the reconstructed attractors of model and experiments. We introduce a modeling paradigm that simultaneously learns to short-tenn predict and to locate the outlines of the attractor by a new way of nonlinear principal component analysis. Closed-loop predictions are constrained to stay within these outlines, to prevent divergence from the attractor. Learning is exceptionally fast: parameter estimation for the 1000 sample laser data from the 1991 Santa Fe time series competition took less than a minute on a 166 MHz Pentium PC.
Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech
An eigenvalue method is developed for analyzing periodic structure in speech. Signals are analyzed by a matrix diagonalization reminiscent of methods for principal component analysis (PCA) and independent com(cid:173) ponent analysis (ICA). Our method-called periodic component analysis (1l"CA)-uses constructive interference to enhance periodic components of the frequency spectrum and destructive interference to cancel noise. The front end emulates important aspects of auditory processing, such as cochlear filtering, nonlinear compression, and insensitivity to phase, with the aim of approaching the robustness of human listeners. The method avoids the inefficiencies of autocorrelation at the pitch period: it does not require long delay lines, and it correlates signals at a clock rate on the order of the actual pitch, as opposed to the original sampling rate.
Sparse Kernel Principal Component Analysis
'Kernel' principal component analysis (PCA) is an elegant non(cid:173) linear generalisation of the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transforma(cid:173) tion into a feature space wherein standard PCA is performed. Un(cid:173) fortunately, the technique is not'sparse', since the components thus obtained are expressed in terms of kernels associated with ev(cid:173) ery training vector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of exam(cid:173) ple vectors, using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness.
Automatic Choice of Dimensionality for PCA
A central issue in principal component analysis (PCA) is choosing the number of principal components to be retained. By interpreting PCA as density estimation, we show how to use Bayesian model selection to es(cid:173) timate the true dimensionality of the data. The resulting estimate is sim(cid:173) ple to compute yet guaranteed to pick the correct dimensionality, given enough data. The estimate involves an integral over the Steifel manifold of k-frames, which is difficult to compute exactly. But after choosing an appropriate parameterization and applying Laplace's method, an accu(cid:173) rate and practical estimator is obtained.
A Generalization of Principal Components Analysis to the Exponential Family
Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponen- tial family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss func- tions, and give examples on simulated data.
Sampling Techniques for Kernel Methods
We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained ap- proximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a "randomized kernel" which behaves like