This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent's belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, Monte-Carlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions.
Precise coordinated planning enables safe and highly efficient motion when many robots must work together in tight spaces, but this would normally require centralised control of all devices which is difficult to scale. We demonstrate a new purely distributed technique based on Gaussian Belief Propagation on multi-robot planning problems formulated by a generic factor graph defining dynamics and collision constraints. We show that our method allows extremely high performance collaborative planning in a simulated road traffic scenario, where vehicles are able to cross each other at a busy multi-lane junction while maintaining much higher average speeds than alternative distributed planning techniques. We encourage the reader to view the accompanying video demonstration to this work at https://youtu.be/5d4LXbxgxaY.
It is well known that the problems of stochastic planning and probabilistic inference are closely related. This paper makes several contributions in this context for factored spaces where the complexity of solutions is challenging. First, we analyze the recently developed SOGBOFA heuristic, which performs stochastic planning by building an explicit computation graph capturing an approximate aggregate simulation of the dynamics. It is shown that the values computed by this algorithm are identical to the approximation provided by Belief Propagation (BP). Second, as a consequence of this observation, we show how ideas on lifted BP can be used to develop a lifted version of SOGBOFA. Unlike implementations of lifted BP, Lifted SOGBOFA has a very simple implementation as a dynamic programming version of the original graph construction. Third, we show that the idea of graph construction for aggregate simulation can be used to solve marginal MAP (MMAP) problems in Bayesian networks, where MAP variables are constrained to be at roots of the network. This yields a novel algorithm for MMAP for this subclass. An experimental evaluation illustrates the advantage of Lifted SOGBOFA for planning.
This paper introduces a technique for planning in hierarchical belief spaces and demonstrates the idea in an autonomous assembly task. The objective is to effectively propagate belief across multiple levels of abstraction to make control decisions that expose useful state information, manage uncertainty and risk, and actively satisfy task specifications. This approach is demonstrated by performing multiple instances of a simple class of assembly tasks using the uBot-6 mobile manipulator with a two-level hierarchy that manages uncertainty over objects and assembly geometry. The result is a method for composing a sequence of manual interactions that causes changes to the environment and exposes new information in order to support belief that the task is satisfied. This approach has the added virtue that it provides a natural way to accomplish tasks while simultaneously suppressing errors and mitigating risk. We compare performance in an example assembly task against a baseline hybrid approach that combines uncertainty management with a symbolic planner and show statistically significant improvement in task outcomes. In additional demonstrations that challenge the system we highlight useful artifacts of the approach---risk management and autonomous recovery from unexpected events.
Replanning via determinization is a recent, popular approach for online planning in MDPs. In this paper we adapt this idea to classical, non-stochastic domains with partial information and sensing actions, presenting a new planner: SDR (Sample, Determinize, Replan). At each step we generate a solution plan to a classical planning problem induced by the original problem. We execute this plan as long as it is safe to do so. When this is no longer the case, we replan. The classical planning problem we generate is based on the translation-based approach for conformant planning introduced by Palacios and Geffner. The state of the classical planning problem generated in this approach captures the belief state of the agent in the original problem. Unfortunately, when this method is applied to planning problems with sensing, it yields a non-deterministic planning problem that is typically very large. Our main contribution is the introduction of state sampling techniques for overcoming these two problems. In addition, we introduce a novel, lazy, regression-based method for querying the agent's belief state during run-time. We provide a comprehensive experimental evaluation of the planner, showing that it scales better than the state-of-the-art CLG planner on existing benchmark problems, but also highlighting its weaknesses with new domains. We also discuss its theoretical guarantees.