Learning Graphical Models
An Application of Markov Random Fields to Range Sensing
Diebel, James, Thrun, Sebastian
This paper describes a highly successful application of MRFs to the problem ofgenerating high-resolution range images. A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. We show that by using such an MRF, we can substantially improve over existing range imaging technology.
Norepinephrine and Neural Interrupts
Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. Previously, we considered the function ofthis neuromodulatory system on a time scale of minutes and longer, and suggested that it signals global uncertainty arising from gross changes in environmental contingencies. However, norepinephrine is also known to be activated phasically by familiar stimuli in welllearned tasks.Here, we extend our uncertainty-based treatment of norepinephrine tothis phasic mode, proposing that it is involved in the detection and reaction to state uncertainty within a task. This role of norepinephrine canbe understood through the metaphor of neural interrupts.
Efficient estimation of hidden state dynamics from spike trains
Danoczy, Marton G., Hahnloser, Richard H. R.
Neurons can have rapidly changing spike train statistics dictated by the underlying network excitability or behavioural state of an animal. To estimate the time course of such state dynamics from single-or multiple neuronrecordings, we have developed an algorithm that maximizes the likelihood of observed spike trains by optimizing the state lifetimes and the state-conditional interspike-interval (ISI) distributions. Our nonparametric algorithmis free of time-binning and spike-counting problems and has the computational complexity of a Mixed-state Markov Model operating on a state sequence of length equal to the total number ofrecorded spikes. As an example, we fit a two-state model to paired recordings of premotor neurons in the sleeping songbird. We find that the two state-conditional ISI functions are highly similar to the ones measured duringwaking and singing, respectively.
Bayesian models of human action understanding
Baker, Chris, Saxe, Rebecca, Tenenbaum, Joshua B.
We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing itsbehavior. Action-understanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order to achieve their goals given the constraints of their environment. Workingin a simple sprite-world domain, we show how this model can be used to infer the goal of an agent and predict how the agent will act in novel situations or when environmental constraints change. The model provides a qualitative account of several kinds of inferences that preverbal infants have been shown to perform, and also fits quantitative predictionsthat adult observers make in a new experiment.
On Local Rewards and Scaling Distributed Reinforcement Learning
We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lowerbound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require anumber of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance witha number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.
Maximum Margin Semi-Supervised Learning for Structured Variables
Altun, Y., McAllester, D., Belkin, M.
Many real-world classification problems involve the prediction of multiple interdependent variables forming some structural dependency. Recentprogress in machine learning has mainly focused on supervised classification of such structured variables. In this paper, we investigate structured classification in a semi-supervised setting. We present a discriminative approach that utilizes the intrinsic geometry ofinput patterns revealed by unlabeled data points and we derive a maximum-margin formulation of semi-supervised learning for structured variables.
Policy-Gradient Methods for Planning
Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning -- in the form of a policy-gradient method -- to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach isto construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.
Resource Allocation Among Agents with MDP-Induced Preferences
Allocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute actions in stochastic environments, modeled as Markov decision processes (MDPs), such that the value of a resource bundle is defined as the expected value of the optimal MDP policy realizable given these resources. We present an algorithm that simultaneously solves the resource-allocation and the policy-optimization problems. This allows us to avoid explicitly representing utilities over exponentially many resource bundles, leading to drastic (often exponential) reductions in computational complexity. We then use this algorithm in the context of self-interested agents to design a combinatorial auction for allocating resources. We empirically demonstrate the effectiveness of our approach by showing that it can, in minutes, optimally solve problems for which a straightforward combinatorial resource-allocation technique would require the agents to enumerate up to 2^100 resource bundles and the auctioneer to solve an NP-complete problem with an input of that size.
FluCaP: A Heuristic Search Planner for First-Order MDPs
Hoelldobler, S., Karabaev, E., Skvortsova, O.
We present a heuristic search algorithm for solving first-order Markov Decision Processes (FOMDPs). Our approach combines first-order state abstraction that avoids evaluating states individually, and heuristic search that avoids evaluating all states. Firstly, in contrast to existing systems, which start with propositionalizing the FOMDP and then perform state abstraction on its propositionalized version we apply state abstraction directly on the FOMDP avoiding propositionalization. This kind of abstraction is referred to as first-order state abstraction. Secondly, guided by an admissible heuristic, the search is restricted to those states that are reachable from the initial state. We demonstrate the usefulness of the above techniques for solving FOMDPs with a system, referred to as FluCaP (formerly, FCPlanner), that entered the probabilistic track of the 2004 International Planning Competition (IPC'2004) and demonstrated an advantage over other planners on the problems represented in first-order terms.
Anytime Point-Based Approximations for Large POMDPs
Pineau, J., Gordon, G., Thrun, S.
The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.