Goto

Collaborating Authors

 Statistical Learning


Sparse Representation and Its Applications in Blind Source Separation

Neural Information Processing Systems

In this paper, sparse representation (factorization) of a data matrix is first discussed. An overcomplete basis matrix is estimated by using the K means method.


Online Learning via Global Feedback for Phrase Recognition

Neural Information Processing Systems

This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently.


Large Scale Online Learning

Neural Information Processing Systems

We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented.


Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

Neural Information Processing Systems

Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimensionality reduction) as well as for Spectral Clustering. This framework is based on seeing these algorithms as learning eigenfunctions of a data-dependent kernel. Numerical experiments show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms due to the choice of training data.


Optimal Manifold Representation of Data: An Information Theoretic Approach

Neural Information Processing Systems

We introduce an information theoretic method for nonparametric, nonlinear dimensionality reduction, based on the infinite cluster limit of rate distortion theory. By constraining the information available to manifold coordinates, a natural probabilistic map emerges that assigns original data to corresponding points on a lower dimensional manifold. With only the information-distortion trade off as a parameter, our method determines the shape of the manifold, its dimensionality, the probabilistic map and the prior that provide optimal description of the data.


Locality Preserving Projections

Neural Information Processing Systems

Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA) - a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold.


Linear Dependent Dimensionality Reduction

Neural Information Processing Systems

We formulate linear dimensionality reduction as a semi-parametric estimation problem, enabling us to study its asymptotic behavior. We generalize the problem beyond additive Gaussian noise to (unknown) non-Gaussian additive noise, and to unbiased non-additive models.


Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

Neural Information Processing Systems

We consider the question of predicting nonlinear time series. Kernel Dynamical Modeling (KDM), a new method based on kernels, is proposed as an extension to linear dynamical models. The kernel trick is used twice: first, to learn the parameters of the model, and second, to compute preimages of the time series predicted in the feature space by means of Support Vector Regression. Our model shows strong connection with the classic Kalman Filter model, with the kernel feature space as hidden state space. Kernel Dynamical Modeling is tested against two benchmark time series and achieves high quality predictions.


Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms

Neural Information Processing Systems

New feature selection algorithms for linear threshold functions are described which combine backward elimination with an adaptive regularization method. This makes them particularly suitable to the classification of microarray expression data, where the goal is to obtain accurate rules depending on few genes only. Our algorithms are fast and easy to implement, since they center on an incremental (large margin) algorithm which allows us to avoid linear, quadratic or higher-order programming methods. We report on preliminary experiments with five known DNA microarray datasets. These experiments suggest that multiplicative large margin algorithms tend to outperform additive algorithms (such as SVM) on feature selection tasks.


Sequential Bayesian Kernel Regression

Neural Information Processing Systems

We propose a method for sequential Bayesian kernel regression. As is the case for the popular Relevance Vector Machine (RVM) [10, 11], the method automatically identifies the number and locations of the kernels. Our algorithm overcomes some of the computational difficulties related to batch methods for kernel regression. It is non-iterative, and requires only a single pass over the data. It is thus applicable to truly sequential data sets and batch data sets alike. The algorithm is based on a generalisation of Importance Sampling, which allows the design of intuitively simple and efficient proposal distributions for the model parameters. Comparative results on two standard data sets show our algorithm to compare favourably with existing batch estimation strategies.