Goto

Collaborating Authors

 Neural Information Processing Systems


Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

Neural Information Processing Systems

We developed a robust control policy design method in high-dimensional state space by using differential dynamic programming with a minimax criterion. As an example, we applied our method to a simulated five link biped robot. The results show lower joint torques from the optimal control policycompared to a hand-tuned PD servo controller. Results also show that the simulated biped robot can successfully walk with unknown disturbances that cause controllers generated by standard differential dynamic programmingand the hand-tuned PD servo to fail. Learning to compensate for modeling error and previously unknown disturbances in conjunction with robust control design is also demonstrated.


A Formulation for Minimax Probability Machine Regression

Neural Information Processing Systems

We formulate the regression problem as one of maximizing the minimum probability,symbolized by ฮฉ, that future predicted outputs of the regression model will be within some ฮต bound of the true regression function. Our formulation is unique in that we obtain a direct estimate of this lower probability bound ฮฉ. The proposed framework, minimax probability machine regression (MPMR), is based on the recently described minimaxprobability machine classification algorithm [Lanckriet et al.] and uses Mercer Kernels to obtain nonlinear regression models. MPMR is tested on both toy and real world data, verifying the accuracy of the ฮฉ bound, and the efficacy of the regression models.


Adaptive Classification by Variational Kalman Filtering

Neural Information Processing Systems

We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classification that combines the computational advantage of a parametric solution with the flexibility of sequential sampling techniques.We regard the parameters of the classifier as latent states in a first order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman filtering. Thevariational Kalman filter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classifiers both in stationary and non-stationary environments. Although we focus on classification, the algorithm is easily extended to other generalized nonlinear models.


Generalizedยฒ Linearยฒ Models

Neural Information Processing Systems

We introduce the Generalized2 Linear2 Model, a statistical estimator whichcombines features of nonlinear regression and factor analysis.


Intrinsic Dimension Estimation Using Packing Numbers

Neural Information Processing Systems

We propose a new algorithm to estimate the intrinsic dimension of data sets. The method is based on geometric properties of the data and requires neitherparametric assumptions on the data generating model nor input parameters to set. The method is compared to a similar, widelyused algorithmfrom the same family of geometric techniques. Experiments showthat our method is more robust in terms of the data generating distribution and more reliable in the presence of noise.


Discriminative Learning for Label Sequences via Boosting

Neural Information Processing Systems

Well-known applications include part-of-speech (POS) tagging, named entity classification, information extraction,text segmentation and phoneme classification in text and speech processing [7] as well as problems like protein homology detection, secondary structure prediction or gene classification in computational biology [3]. Up to now, the predominant formalism for modeling and predicting label sequences has been based on Hidden Markov Models (HMMs) and variations thereof. Yet, despite its success, generative probabilistic models - of which HMMs are a special case - have two major shortcomings, which this paper is not the first one to point out. First, generative probabilistic models are typically trained using maximum likelihood estimation (MLE) for a joint sampling model of observation and label sequences. As has been emphasized frequently, MLE based on the joint probability model is inherently non-discriminative and thus may lead to suboptimal prediction accuracy.


Kernel Dependency Estimation

Neural Information Processing Systems

Jason Weston, Olivier Chapelle, Andre Elisseeff, Bernhard Scholkopf and Vladimir Vapnik* Max Planck Institute for Biological Cybernetics, 72076 Tubingen, Germany *NEC Research Institute, Princeton, NJ 08540 USA Abstract We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions,thus embedding the objects into vector spaces. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images. 1 Introduction In this article we consider the rather general learning problem of finding a dependency betweeninputs x E X and outputs y E Y given a training set (Xl,yl), ...,(xm, Ym) This includes conventional pattern recognition and regression estimation. It also encompasses more complex dependency estimation tasks, e.g mapping of a certain class of strings to a certain class of graphs (as in text parsing) or the mapping of text descriptions to images.


Transductive and Inductive Methods for Approximate Gaussian Process Regression

Neural Information Processing Systems

Gaussian process regression allows a simple analytical treatment of exact Bayesianinference and has been found to provide good performance, yet scales badly with the number of training data. In this paper we compare severalapproaches towards scaling Gaussian processes regression to large data sets: the subset of representers method, the reduced rank approximation, online Gaussian processes, and the Bayesian committee machine.Furthermore we provide theoretical insight into some of our experimental results. We found that subset of representers methods can give good and particularly fast predictions for data sets with high and medium noise levels. On complex low noise data sets, the Bayesian committee machine achieves significantly better accuracy, yet at a higher computational cost.


Parametric Mixture Models for Multi-Labeled Text

Neural Information Processing Systems

We propose probabilistic generative models, called parametric mixture models(PMMs), for multiclass, multi-labeled text categorization problem.Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category isjudged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms forPMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied tomulti-labeled text categorization using real World Wide Web pages.


Evidence Optimization Techniques for Estimating Stimulus-Response Functions

Neural Information Processing Systems

An essential step in understanding the function of sensory nervous systems isto characterize as accurately as possible the stimulus-response function (SRF) of the neurons that relay and process sensory information. Oneincreasingly common experimental approach is to present a rapidly varying complex stimulus to the animal while recording the responses ofone or more neurons, and then to directly estimate a functional transformation of the input that accounts for the neuronal firing. The estimation techniques usually employed, such as Wiener filtering or other correlation-based estimation of the Wiener or Volterra kernels, are equivalent to maximum likelihood estimation in a Gaussian-output-noise regression model. We explore the use of Bayesian evidence-optimization techniques to condition these estimates. We show that by learning hyperparameters thatcontrol the smoothness and sparsity of the transfer function it is possible to improve dramatically the quality of SRF estimates, as measured by their success in predicting responses to novel input.