Some recent works in conditional planning have proposed reachability heuristics to improve planner scalability, but many lack a formal description of the properties of their distance estimates. To place previous work in context and extend work on heuristics for conditional planning, we provide a formal basis for distance estimates between belief states. We give a definition for the distance between belief states that relies on aggregating underlying state distance measures. We give several techniques to aggregate state distances and their associated properties. Many existing heuristics exhibit a subset of the properties, but in order to provide a standardized comparison we present several generalizations of planning graph heuristics that are used in a single planner.
This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent's belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, Monte-Carlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions.
This paper introduces a technique for planning in hierarchical belief spaces and demonstrates the idea in an autonomous assembly task. The objective is to effectively propagate belief across multiple levels of abstraction to make control decisions that expose useful state information, manage uncertainty and risk, and actively satisfy task specifications. This approach is demonstrated by performing multiple instances of a simple class of assembly tasks using the uBot-6 mobile manipulator with a two-level hierarchy that manages uncertainty over objects and assembly geometry. The result is a method for composing a sequence of manual interactions that causes changes to the environment and exposes new information in order to support belief that the task is satisfied. This approach has the added virtue that it provides a natural way to accomplish tasks while simultaneously suppressing errors and mitigating risk. We compare performance in an example assembly task against a baseline hybrid approach that combines uncertainty management with a symbolic planner and show statistically significant improvement in task outcomes. In additional demonstrations that challenge the system we highlight useful artifacts of the approach---risk management and autonomous recovery from unexpected events.
It is well known that the problems of stochastic planning and probabilistic inference are closely related. This paper makes several contributions in this context for factored spaces where the complexity of solutions is challenging. First, we analyze the recently developed SOGBOFA heuristic, which performs stochastic planning by building an explicit computation graph capturing an approximate aggregate simulation of the dynamics. It is shown that the values computed by this algorithm are identical to the approximation provided by Belief Propagation (BP). Second, as a consequence of this observation, we show how ideas on lifted BP can be used to develop a lifted version of SOGBOFA. Unlike implementations of lifted BP, Lifted SOGBOFA has a very simple implementation as a dynamic programming version of the original graph construction. Third, we show that the idea of graph construction for aggregate simulation can be used to solve marginal MAP (MMAP) problems in Bayesian networks, where MAP variables are constrained to be at roots of the network. This yields a novel algorithm for MMAP for this subclass. An experimental evaluation illustrates the advantage of Lifted SOGBOFA for planning.
Replanning via determinization is a recent, popular approach for online planning in MDPs. In this paper we adapt this idea to classical, non-stochastic domains with partial information and sensing actions, presenting a new planner: SDR (Sample, Determinize, Replan). At each step we generate a solution plan to a classical planning problem induced by the original problem. We execute this plan as long as it is safe to do so. When this is no longer the case, we replan.