Deciding what's next? is a fundamental problem in robotics and Artificial Intelligence. Under belief space planning (BSP), in a partially observable setting, it involves calculating the expected accumulated belief-dependent reward, where the expectation is with respect to all future measurements. Since solving this general un-approximated problem quickly becomes intractable, state of the art approaches turn to approximations while still calculating planning sessions from scratch. In this work we propose a novel paradigm, Incremental BSP (iX-BSP), based on the key insight that calculations across planning sessions are similar in nature and can be appropriately re-used. We calculate the expectation incrementally by utilizing Multiple Importance Sampling techniques for selective re-sampling and re-use of measurement from previous planning sessions. The formulation of our approach considers general distributions and accounts for data association aspects. We demonstrate how iX-BSP could benefit existing approximations of the general problem, introducing iML-BSP, which re-uses calculations across planning sessions under the common Maximum Likelihood assumption. We evaluate both methods and demonstrate a substantial reduction in computation time while statistically preserving accuracy. The evaluation includes both simulation and real-world experiments considering autonomous vision-based navigation and SLAM. As a further contribution, we introduce to iX-BSP the non-integral wildfire approximation, allowing one to trade accuracy for computational performance by averting from updating re-used beliefs when they are "close enough". We evaluate iX-BSP under wildfire demonstrating a substantial reduction in computation time while controlling the accuracy sacrifice. We also provide analytical and empirical bounds of the effect wildfire holds over the objective value.
The paper argues that (1) stochastic planning should be used as a core problem domain for relational probabilistic models providing problems of interest that are challenging for current approaches and significant scope for extending their capabilities, (2) that symbolic dynamic programming solving such problems can be seen as a prime example of lifted inference in relational probabilistic problems, (3) that first order decision diagrams provide a useful tool to drive such lifted computations, and (4) that the resulting lifted inference is qualitatively different from what other approaches are providing. As a result, this relationship can be studied to the benefit of developing foundations for relational probabilistic models and to the benefit of stochastic planning.
Combining Answer Set Programming (ASP) and Constraint Logic Programming (CLP) can create a more powerful language for knowledge representation and reasoning. The language AC(C) is designed to integrate ASP and CLP. Compared with existing integration of ASP and CSP, AC(C) allows representing user-defined constraints. Such integration provides great power for applications requiring logical reasoning involving constraints, e.g., temporal planning. In AC(C), user-defined and primitive constraints can be solved by a CLP inference engine while the logical reasoning over those constraints and regular logic literals is solved by an ASP inference engine (i.e., solver). My PhD work includes improving the language AC(C), implementing its faster inference engine and investigating how effective the new system can be used to solve a challenging application, temporal planning.
The field of reinforcement learning can be split into model-based and model-free methods. Here, we unify these approaches by casting model-free policy optimisation as amortised variational inference, and model-based planning as iterative variational inference, within a `control as hybrid inference' (CHI) framework. We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. Using a didactic experiment, we demonstrate that the proposed algorithm operates in a model-based manner at the onset of learning, before converging to a model-free algorithm once sufficient data have been collected. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines. CHI thus provides a principled framework for harnessing the sample efficiency of model-based planning while retaining the asymptotic performance of model-free policy optimisation.
People routinely infer the goals of others by observing their actions over time. Remarkably, we can do so even when those actions lead to failure, enabling us to assist others when we detect that they might not achieve their goals. How might we endow machines with similar capabilities? Here we present an architecture capable of inferring an agent's goals online from both optimal and non-optimal sequences of actions. Our architecture models agents as boundedly-rational planners that interleave search with execution by replanning, thereby accounting for sub-optimal behavior. These models are specified as probabilistic programs, allowing us to represent and perform efficient Bayesian inference over an agent's goals and internal planning processes. To perform such inference, we develop Sequential Inverse Plan Search (SIPS), a sequential Monte Carlo algorithm that exploits the online replanning assumption of these models, limiting computation by incrementally extending inferred plans as new actions are observed. We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.