Goto

Collaborating Authors

 Country


Escaping the Convex Hull with Extrapolated Vector Machines

Neural Information Processing Systems

Maximum margin classifiers such as Support Vector Machines (SVMs) critically depends upon the convex hulls of the training samples of each class, as they implicitly search for the minimum distance between the convex hulls. We propose Extrapolated Vector Machines (XVMs) which rely on extrapolations outside these convex hulls. XVMs improve SVM generalization very significantly on the MNIST [7] OCR data. They share similarities with the Fisher discriminant: maximize the inter-class margin while minimizing the intra-class disparity.


Discriminative Direction for Kernel Classifiers

Neural Information Processing Systems

In many scientific and engineering applications, detecting and understanding differences between two groups of examples can be reduced to a classical problem of training a classifier for labeling new examples while making as few mistakes as possible. In the traditional classification setting, the resulting classifier is rarely analyzed in terms of the properties of the input data captured by the discriminative model. However, such analysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the statistical model in the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we define a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classifier function. We derive the discriminative direction for kernel-based classifiers, demonstrate the technique on several examples and briefly discuss its use in the statistical shape analysis, an application that originally motivated this work.


Very loopy belief propagation for unwrapping phase images

Neural Information Processing Systems

Since the discovery that the best error-correcting decoding algorithm can be viewed as belief propagation in a cycle-bound graph, researchers have been trying to determine under what circumstances "loopy belief propagation" is effective for probabilistic inference. Despite several theoretical advances in our understanding of loopy belief propagation, to our knowledge, the only problem that has been solved using loopy belief propagation is error-correcting decoding on Gaussian channels. We propose a new representation for the two-dimensional phase unwrapping problem, and we show that loopy belief propagation produces results that are superior to existing techniques. This is an important result, since many imaging techniques, including magnetic resonance imaging and interferometric synthetic aperture radar, produce phase-wrapped images. Interestingly, the graph that we use has a very large number of very short cycles, supporting evidence that a large minimum cycle length is not needed for excellent results using belief propagation. 1 Introduction Phase unwrapping is an easily stated, fundamental problem in image processing (Ghiglia and Pritt 1998).


Product Analysis: Learning to Model Observations as Products of Hidden Variables

Neural Information Processing Systems

Factor analysis and principal components analysis can be used to model linear relationships between observed variables and linearly map high-dimensional data to a lower-dimensional hidden space. In factor analysis, the observations are modeled as a linear combination of normally distributed hidden variables. We describe a nonlinear generalization of factor analysis, called "product analysis", that models the observed variables as a linear combination of products of normally distributed hidden variables. Just as factor analysis can be viewed as unsupervised linear regression on unobserved, normally distributed hidden variables, product analysis can be viewed as unsupervised linear regression on products of unobserved, normally distributed hidden variables. The mapping between the data and the hidden space is nonlinear, so we use an approximate variational technique for inference and learning.


Fast, Large-Scale Transformation-Invariant Clustering

Neural Information Processing Systems

In previous work on "transformed mixtures of Gaussians" and "transformed hidden Markov models", we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speedup is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities.


KLD-Sampling: Adaptive Particle Filters

Neural Information Processing Systems

Over the last years, particle filters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efficiency of particle filters by adapting the size of sample sets on-the-fly. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle filter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle filters with fixed sample set sizes and over a previously introduced adaptation technique.


Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

Neural Information Processing Systems

We propose a framework based on a parametric quadratic programming (QP) technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias problem, while the second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other existing technique which obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the complete SVM training problem. A combination of these two methods is more flexible than each individual method and, among other things, produces an incremental algorithm which exactly solve the 1-Norm Soft Margin SVM optimization problem. Applying Selective Sampling techniques may further boost convergence.


Adaptive Sparseness Using Jeffreys Prior

Neural Information Processing Systems

In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.


Approximate Dynamic Programming via Linear Programming

Neural Information Processing Systems

The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to- go function. We develop bounds on the approximation error and present experimental results in the domain of queueing network control, providing empirical support for the methodology.


A kernel method for multi-labelled classification

Neural Information Processing Systems

This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties with SVMs. We tested it on a Yeast gene functional classification problem with positive results.