Goto

Collaborating Authors

 Planning & Scheduling


On-line Policy Improvement using Monte-Carlo Search

Neural Information Processing Systems

We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo sim(cid:173) ulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP! and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algo(cid:173) rithm to the domain of backgammon.


Learning to Schedule Straight-Line Code

Neural Information Processing Systems

Program execution speed on modem computers is sensitive, by a factor of two or more, to the order in which instructions are presented to the proces(cid:173) sor. To realize potential execution efficiency, an optimizing compiler must employ a heuristic algorithm for instruction scheduling. Such algorithms are painstakingly hand-crafted, which is expensive and time-consuming. We show how to cast the instruction scheduling problem as a learning task, ob(cid:173) taining the heuristic scheduling algorithm automatically. Our focus is the narrower problem of scheduling straight-line code (also called basic blocks of instructions).


ARA*: Anytime A* with Provable Bounds on Sub-Optimality

Neural Information Processing Systems

In real world planning problems, time for deliberation is often limited. Anytime planners are well suited for these problems: they find a feasi- ble solution quickly and then continually work on improving it until time runs out. In this paper we propose an anytime heuristic search, ARA, which tunes its performance bound based on available search time. It starts by finding a suboptimal solution quickly using a loose bound, then tightens the bound progressively as time allows. Given enough time it finds a provably optimal solution.


Auction Mechanism Design for Multi-Robot Coordination

Neural Information Processing Systems

The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated inter- est: the solution of large, weakly coupled MDPs, and the design and im- plementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm.


Planning for Markov Decision Processes with Sparse Stochasticity

Neural Information Processing Systems

Planning algorithms designed for deterministic worlds, such as A* search, usually run much faster than algorithms designed for worlds with uncertain action outcomes, such as value iteration. Real-world planning problems often exhibit uncertainty, which forces us to use the slower algorithms to solve them. Many real-world planning problems exhibit sparse uncertainty: there are long sequences of deterministic actions which accomplish tasks like moving sensor platforms into place, inter- spersed with a small number of sensing actions which have uncertain out- comes. In this paper we describe a new planning algorithm, called MCP (short for MDP Compression Planning), which combines A* search with value iteration for solving Stochastic Shortest Path problem in MDPs with sparse stochasticity. We present experiments which show that MCP can run substantially faster than competing planners in domains with sparse uncertainty; these experiments are based on a simulation of a ground robot cooperating with a helicopter to fill in a partial map and move to a goal location. In deterministic planning problems, optimal paths are acyclic: no state is visited more than once. Because of this property, algorithms like A* search can guarantee that they visit each state in the state space no more than once.


Off-Road Obstacle Avoidance through End-to-End Learning

Neural Information Processing Systems

We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw in put images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training r uns collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two f orwardpointing wireless color cameras. A remote computer process es the video and controls the robot via radio. The learning system is a lar ge 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images.


Monte-Carlo Planning in Large POMDPs

Neural Information Processing Systems

This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent's belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, Monte-Carlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions.


Variance Reduction in Monte-Carlo Tree Search

Neural Information Processing Systems

Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates.


Action-Model Based Multi-agent Plan Recognition

Neural Information Processing Systems

Multi-Agent Plan Recognition (MAPR) aims to recognize dynamic team structures and team behaviors from the observed team traces (activity sequences) of a set of intelligent agents. Previous MAPR approaches required a library of team activity sequences (team plans) be given as input. However, collecting a library of team plans to ensure adequate coverage is often difficult and costly. In this paper, we relax this constraint, so that team plans are not required to be provided beforehand. We assume instead that a set of action models are available.


Variational Planning for Graph-based MDPs

Neural Information Processing Systems

Markov Decision Processes (MDPs) are extremely useful for modeling and solving sequential decision making problems. Graph-based MDPs provide a compact representation for MDPs with large numbers of random variables. However, the complexity of exactly solving a graph-based MDP usually grows exponentially in the number of variables, which limits their application. We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. In particular, by exploiting the graph structure of graph-based MDPs, we propose a factored variational value iteration algorithm in which the value function is first approximated by the multiplication of local-scope value functions, then solved by minimizing a Kullback-Leibler (KL) divergence. The KL divergence is optimized using the belief propagation algorithm, with complexity exponential in only the cluster size of the graph.