Planning & Scheduling
Multi-Agent Plan Recognition: Formalization and Algorithms
Banerjee, Bikramjit (University of Southern Mississippi) | Kraemer, Landon (University of Southern Mississippi) | Lyle, Jeremy (University of Southern Mississippi)
Multi-Agent Plan Recognition (MAPR) seeks to identify the dynamic team structures and team behaviors from the observations of the activity-sequences of a set of intelligent agents, based on a library of known team-activities (plan library). It has important applications in analyzing data from automated monitoring, surveillance, and intelligence analysis in general. In this paper, we formalize MAPR using a basic model that explicates the cost of abduction in single agent plan recognition by "flattening" or decompressing the (usually compact, hierarchical) plan library. We show that single-agent plan recognition with a decompressed library can be solved in time polynomial in the input size, while it is known that with a compressed (by partial ordering constraints) library it is NP-complete. This leads to an important insight: that although the compactness of the plan library plays an important role in the hardness of single-agent plan recognition (as recognized in the existing literature), that is not the case with multiple agents. We show, for the first time, that MAPR is NP-complete even when the (multi-agent) plan library is fully decompressed. As with previous solution approaches, we break the problem into two stages: hypothesis generation and hypothesis search. We show that Knuth's ``Algorithm X'' (with the efficient ``dancing links'' representation) is particularly suited for our model, and can be adapted to perform a branch and bound search for the second stage, in this model. We show empirically that this new approach leads to significant pruning of the hypothesis space in MAPR.
Security Games with Arbitrary Schedules: A Branch and Price Approach
Jain, Manish (University of Southern California) | Kardes, Erim (University of Southern California) | Kiekintveld, Christopher (University of Southern California) | Ordonez, Fernando (University of Southern California) | Tambe, Milind (University of Southern California)
Security games, and important class of Stackelberg games, are used in deployed decision-support tools in use by LAX police and the Federal Air Marshals Service. The algorithms used to solve these games find optimal randomized schedules to allocate security resources for infrastructure protection. Unfortunately, the state of the art algorithms either fail to scale or to provide a correct solution for large problems with arbitrary scheduling constraints. We introduce ASPEN, a branch-and-price approach that overcomes these limitations based on two key contributions: (i) A column-generation approach that exploits a novel network flow representation, avoiding a combinatorial explosion of schedule allocations; (ii) A branch-and-bound algorithm that generates bounds via a fast algorithm for solving security games with relaxed scheduling constraints. ASPEN is the first known method for efficiently solving massive security games with arbitrary schedules.
Reinforcement Learning via AIXI Approximation
Veness, Joel (University of New South Wales and NICTA) | Ng, Kee Siong (Medicare Australia and Australian National University) | Hutter, Marcus (Australian National University and NICTA) | Silver, David (University College London)
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.
Lazy Theta*: Any-Angle Path Planning and Path Length Analysis in 3D
Nash, Alex (University of Southern California) | Koenig, Sven (University of Southern California) | Tovey, Craig (Georgia Institute of Technology)
Grids with blocked and unblocked cells are often used to represent continuous 2D and 3D environments in robotics and video games. The shortest paths formed by the edges of 8-neighbor 2D grids can be up to 8% longer than the shortest paths in the continuous environment. Theta* typically finds much shorter paths than that by propagating information along graph edges (to achieve short runtimes) without constraining paths to be formed by graph edges (to find short "any-angle" paths). We show in this paper that the shortest paths formed by the edges of 26-neighbor 3D grids can be 13% longer than the shortest paths in the continuous environment, which highlights the need for smart path planning algorithms in 3D. Theta* can be applied to 3D grids in a straight-forward manner, but it performs a line-of-sight check for each unexpanded visible neighbor of each expanded vertex and thus it performs many more line-of-sight checks per expanded vertex on a 26-neighbor 3D grid than on an 8-neighbor 2D grid. We therefore introduce Lazy Theta*, a variant of Theta* which uses lazy evaluation to perform only one line-of-sight check per expanded vertex (but with slightly more expanded vertices). We show experimentally that Lazy Theta* finds paths faster than Theta* on 26-neighbor 3D grids, with one order of magnitude fewer line-of-sight checks and without an increase in path length.
High-Quality Policies for the Canadian Traveler's Problem
Eyerich, Patrick (Albert-Ludwigs-Universität Freiburg) | Keller, Thomas (Albert-Ludwigs-Universität Freiburg) | Helmert, Malte (Albert-Ludwigs-Universität Freiburg)
We consider the stochastic variant of the Canadian Traveler's Problem, a path planning problem where adverse weather can cause some roads to be untraversable. The agent does not initially know which roads can be used. However, it knows a probability distribution for the weather, and it can observe the status of roads incident to its location. The objective is to find a policy with low expected travel cost. We introduce and compare several algorithms for the stochastic CTP. Unlike the optimistic approach most commonly considered in the literature, the new approaches we propose take uncertainty into account explicitly. We show that this property enables them to generate policies of much higher quality than the optimistic one, both theoretically and experimentally.
Interactive Task-Plan Learning
Dong, Shuonan (Massachusetts Institute of Technology)
Low-level direct commanding of space robots can be time consuming or impractical for complex systems with many degrees of freedom. My research will adaptively raise the level of interaction between the operator and the robot by (1) allowing the robot to learn implicit plans by detecting patterns in the interaction history, and (2) enabling the human to demonstrate continuous motions through teleoperation. Learned tasks and plans are recorded for future use. I introduce a novel representation of continuous actions called parameterized probabilistic flow tubes that I hypothesize will more closely encode a human's intended motions and provide flexibility during execution in new situations. I also introduce the use of planning for plan recognition in the domain of hybrid tasks.
Hierarchical Skill Learning for High-Level Planning
MacGlashan, James (University of Maryland, Baltimore County)
I present skill bootstrapping, a proposed new research direction for agent learning and planning that allows an agent to start with low-level primitive actions, and develop skills that can be used for higher-level planning. Skills are developed over the course of solving many different problems in a domain, using reinforcement learning techniques to complement the benefits and disadvantages of heuristic-search planning. I describe the overall architecture of the proposed approach and discuss how it relates to other work.
Continual On-Line Planning
Lemons, Sofia (University of New Hampshire)
My research represents an approach to integrating planning and execution in time-sensitive environments. The primary focus is on a problem called continual on-line planning. New goals arrive stochastically during execution, the agent issues actions for execution one at a time, and the environment is otherwise deterministic. My dissertation will address this setting in three stages: optimizing total goal achievement time, handling on-line goal arrival during planning or execution, and adapting to changes in state also during planning or execution. My current approach to this problem is based on incremental heuristic search. The two central issues are the decision of which partial plans to elaborate during search and the decision of when to issue an action for execution. I have proposed an extension of Russell and Wefald's decision-theoretic A* algorithm that is not limited by assumptions of an admissible heuristic like DTA*. This algorithm, Decision Theoretic On-line Continual Search (DTOCS), handles the complexities of the on-line setting by balancing deliberative planning and real-time response.
Probabilistic Programming for Planning Problems
Thon, Ingo (KU Leuven) | Gutmann, Bernd (KU Leuven) | Broeck, Guy Van den (KU Leuven)
Probabilistic programing is an emerging field at the intersection of statistical learning and programming languages. An appealing property of probabilistic programming languages (PPL) is their support for constructing arbitrary probability distributions. This allows one to model many different domains and solve a variety of problems. We show the link between probabilistic planning and PPLs by introducing a translation that allows one to map probabilistic planning problems onto parameter learning in PPLs. The advantage of our approach is twofold. Firstly, having the expressivity of a programming language simplifies modeling compared to using existing planning languages such as PPDDL. Secondly, there exist effective general-purpose learning algorithms that — having the correct encoding — can readily be used to learn optimal policies. In this paper we use ProbLog — a probabilistic version of Prolog — as programming language, but our approach can be applied on any other PPL as well.
Plan Libraries for Plan Recognition: Do We Really Know What They Model?
Goldman, Robert P. (SIFT, LLC) | Kabanza, Froduald (Universite de Sherbrooke) | Bellefeuille, Philipe (Universite de Sherbrooke)
In this paper we explore challenges related to the engineering of plan libraries for plan recognition. This is an often overlooked problem, yet essential in the design of any real world plan recognizers. We mainly discuss challenges related to the evaluation of equivalence between plan libraries. We explain why this is a potential source of ill-conceived plan libraries. We consider an existing well-established probabilistic plan recognizer as vehicle for our discussion, using the formalism of probabilistic hierarchical task networks to represent plans. We propose avenues for exploring solutions to those challenges within that framework.