Goto

Collaborating Authors

 Uncertainty


Rules and Similarity in Concept Learning

Neural Information Processing Systems

This paper argues that two apparently distinct modes of generalizing concepts -abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework. Bayesexplains the specific workings of these two modes - which rules are abstracted, how similarity is measured - as well as why generalization shouldappear rule-or similarity-based in different situations. This analysis also suggests why the rules/similarity distinction, even if not computationally fundamental, may still be useful at the algorithmic level as part of a principled approximation to fully Bayesian learning.


Mixture Density Estimation

Neural Information Processing Systems

Gaussian mixtures (or so-called radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networksfor function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement ofperformance as components of the network are introduced one at a time. In particular, for mixture density estimation we show that a k-component mixture estimated by maximum likelihood (or by an iterative likelihood improvement that we introduce) achieves log-likelihood within order 1/k of the log-likelihood achievable by any convex combination. Consequences for approximation and estimation usingKullback-Leibler risk are also given. A Minimum Description Length principle selects the optimal number of components kthat minimizes the risk bound. 1 Introduction In density estimation, Gaussian mixtures provide flexible-basis representations for densities that can be used to model heterogeneous data in high dimensions. The main theme of the paper is to give approximation and estimation bounds of arbitrary densities by finite mixture densities.


Bayesian Map Learning in Dynamic Environments

Neural Information Processing Systems

We consider the problem of learning a grid-based map using a robot with noisy sensors and actuators. We compare two approaches: online EM, where the map is treated as a fixed parameter, and Bayesian inference, where the map is a (matrix-valued) random variable. We show that even on a very simple example, online EM can get stuck in local minima, which causes the robot to get "lost" and the resulting map to be useless. By contrast, the Bayesian approach, by maintaining multiple hypotheses, is much more robust. Wethen introduce a method for approximating the Bayesian solution, called Rao-Blackwellised particle filtering. We show that this approximation, when coupled with an active learning strategy, is fast but accurate.


Approximate Inference A lgorithms for Two-Layer Bayesian Networks

Neural Information Processing Systems

We present a class of approximate inference algorithms for graphical models of the QMR-DT type. We give convergence rates for these algorithms andfor the Jaakkola and Jordan (1999) algorithm, and verify these theoretical predictions empirically.


Maximum Entropy Discrimination

Neural Information Processing Systems

We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involvedistributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under thisclass and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class-conditional models within this framework.


Policy Gradient Methods for Reinforcement Learning with Function Approximation

Neural Information Processing Systems

Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining apolicy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent ofthe value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.


Learning Factored Representations for Partially Observable Markov Decision Processes

Neural Information Processing Systems

The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional independence assumptionsbetween random variables are compactly represented by network parameters. The parameters are learned online, and approximations areused to perform inference and to compute the optimal value function. The relative effects of inference and value function approximations onthe quality of the final policy are investigated, by learning to solve a moderately difficult driving task. The two value function approximations, linearand quadratic, were found to perform similarly, but the quadratic model was more sensitive to initialization. Both performed below thelevel of human performance on the task.


Policy Search via Density Estimation

Neural Information Processing Systems

We propose a new approach to the problem of searching a space of stochastic controllers for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP). Following several other authors, our approach is based on searching in parameterized families of policies (for example, via gradient descent) to optimize solution quality. However,rather than trying to estimate the values and derivatives of a policy directly, we do so indirectly using estimates for the probability densitiesthat the policy induces on states at the different points in time. This enables our algorithms to exploit the many techniques for efficient and robust approximate density propagation in stochastic systems. Weshow how our techniques can be applied both to deterministic propagation schemes (where the MDP's dynamics are given explicitly in compact form,) and to stochastic propagation schemes (where we have access only to a generative model, or simulator, of the MDP).


Learning from User Feedback in Image Retrieval Systems

Neural Information Processing Systems

We formulate the problem of retrieving images from visual databases as a problem of Bayesian inference. This leads to natural and effective solutions for two of the most challenging issues in the design of a retrieval system: providing support for region-based queries without requiring prior image segmentation, and accounting for user-feedback during a retrieval session. We present a new learning algorithm that relies on belief propagation to account for both positive and negative examples of the user's interests.