Planning & Scheduling
Efficient Bounds in Heuristic Search Algorithms for Stochastic Shortest Path Problems
Hansen, Eric A. (Mississippi State Univeristy) | Abdoulahi, Ibrahim (Mississippi State University)
Fully observable decision-theoretic planning problems are commonly modeled as stochastic shortest path (SSP) problems. For this class of planning problems, heuristic search algorithms (including LAO*, RTDP, and related algorithms), as well as the value iteration algorithm on which they are based, lack an efficient test for convergence to an ε-optimal policy (except in the special case of discounting). We introduce a simple and efficient test for convergence that applies to SSP problems with positive action costs. The test can detect whether a policy is proper, that is, whether it achieves the goal state with probability 1. If proper, it gives error bounds that can be used to detect convergence to an ε-optimal solution. The convergence test incurs no extra overhead besides computing the Bellman residual, and the performance guarantee it provides substantially improves the utility of this class of planning algorithms.
Measuring Plan Diversity: Pathologies in Existing Approaches and A New Plan Distance Metric
Goldman, Robert P. (SIFT, LLC) | Kuter, Ugur (SIFT, LLC)
In this paper we present a plan-plan distance metric based on Kolmogorov(Algorithmic) complexity. Generating diverse sets of plans is useful for task ssuch as probing user preferences and reasoning about vulnerability to cyberattacks. Generating diverse plans, and comparing different diverse planning approaches requires a domain-independent, theoretically motivated definition of the diversity distance between plans. Previously proposed diversity measures are not theoretically motivated, and can provide inconsistent results on the sameplans. We define the diversity of plans in terms of how surprising one plan is givenanother or, its inverse, the conditional information in one plan givenanother. Kolmogorov complexity provides a domain independent theory of conditional information. While Kolmogorov complexity is not computable, a related metric, Normalized Compression Distance (NCD), provides a well-behaved approximation. In this paper we introduce NCD as an alternative diversity metric, and analyze its performance empirically, in comparison with previous diversity measures, showing strengths and weaknesses of each.We also examine the use of different compressor sin NCD. We show how NCD can be used to select a training set for HTN learning,giving an example of the utility of diversity metrics. We conclude withsuggestions for future work on improving, extending, and applying it to serve new applications.
Transition Constraints for Parallel Planning
Ghooshchi, Nina Ghanbari (Urmia University) | Namazi, Majid (Urmia University) | Newton, M A Hakim (Griffith University) | Sattar, Abdul (Griffith University)
We present a planner named Transition Constraints for Parallel Planning (TCPP). TCPP constructs a new constraint model from domain transition graphs (DTG) of a given planning problem. TCPP encodes the constraint model by using table constraints that allow don't cares or wild cards as cell values. TCPP uses Minion the constraint solver to solve the constraint model and returns the parallel plan. Empirical results exhibit the efficiency of our planning system over state-of-the-art constraint-based planners.
Factored MCTS for Large Scale Stochastic Planning
Cui, Hao (Tufts University) | Khardon, Roni (Tufts University) | Fern, Alan (Oregon State University) | Tadepalli, Prasad (Oregon State University)
This paper investigates stochastic planning problemswith large factored state and action spaces. We show that even with moderate increase in the size of existing challenge problems, the performance of state of the art algorithms deteriorates rapidly, making them ineffective.To address this problem we propose a family of simple but scalable online planning algorithms that combine sampling, as in Monte Carlo tree search, with “aggregation,” where the aggregation approximates a distribution over random variables by the product of their marginals. The algorithms are correct under some rather strong technical conditions and can serve as an unsound but effective heuristic when the conditions do not hold. An extensive experimental evaluation demonstrates that the new algorithms provide significant improvement over the state of the art when solving largeproblems in a number of challenge benchmark domains.
Strong Temporal Planning with Uncontrollable Durations: A State-Space Approach
Cimatti, Alessandro (Fondazione Bruno Kessler) | Micheli, Andrea (Fondazione Bruno Kessler) | Roveri, Marco (Fondazione Bruno Kesslerr)
In many practical domains, planning systems are required to reason about durative actions. A common assumption in the literature is that the executor is allowed to decide the duration of each action. However, this assumption may be too restrictive for applications. In this paper, we tackle the problem of temporal planning with uncontrollable action durations. We show how to generate robust plans,that guarantee goal achievement despite the uncontrollability of the actual duration of the actions. We extend the state-space temporalplanning framework, integrating recent techniques for solving temporalproblems under uncertainty. We discuss different ways of lifting the total order plans generated by the heuristic search to partial orderplans, showing (in)completeness results for each of them. We implemented our approach on top of COLIN, a state-of-the-art planner. An experimental evaluation over several benchmark problems shows the practical feasibility of the proposed approach.
Some Fixed Parameter Tractability Results for Planning with Non-Acyclic Domain-Transition Graphs
Bäckström, Christer (Linköping University, Linköping, Sweden)
Bäckström studied the parameterised complexity of planning when the domain-transition graphs (DTGs) are acyclic. He used the parameters d (domain size), k (number of paths in the DTGs) and w (treewidth of the causal graph), and showed that planning is fixed-parameter tractable (fpt) in these parameters, and fpt in only parameter k if the causal graph is a polytree. We continue this work by considering some additional cases of non-acyclic DTGs. In particular, we consider the case where each strongly connected component (SCC) in a DTG must be a simple cycle, and we show that planning is fpt for this case if the causal graph is a polytree. This is done by first preprocessing the instance to construct an equivalent abstraction and then apply Bäckströms technique to this abstraction. We use the parameters d and k , reinterpreting this as the number of paths in the condensation of a DTG, and the two new parameters c (the number of contracted cycles along a path) and p max (an upper bound for walking around cycles, when not unbounded).
Multi-Robot Auctions for Allocation of Tasks with Temporal Constraints
Nunes, Ernesto (University of Minnesota) | Gini, Maria (University of Minnesota)
We propose an auction algorithm to allocate tasks that have temporal constraints to cooperative robots. Temporal constraints are expressed as time windows, within which a task must be executed. There are no restrictions on the time windows, which are allowed to overlap. Robots model their temporal constraints using a simple temporal network, enabling them to maintain consistent schedules. When bidding on a task, a robot takes into account its own current commitments and an optimization objective, which is to minimize the time of completion of the last task alone or in combination with minimizing the distance traveled. The algorithm works both when all the tasks are known upfront and when tasks arrive dynamically. We show the performance of the algorithm in simulation with different numbers of tasks and robots, and compare it with a baseline greedy algorithm and a state-of-the-art auction algorithm. Our algorithm is computationally frugal and consistently allocates more tasks than the competing algorithms.
Pruning Game Tree by Rollouts
Huang, Bojun (Microsoft Research)
In this paper we show that the alpha-beta algorithm and its successor MT-SSS*, as two classic minimax search algorithms, can be implemented as rollout algorithms , a generic algorithmic paradigm widely used in many domains. Specifically, we define a family of rollout algorithms, in which the rollout policy is restricted to select successor nodes only from a certain subset of the children list. We show that any rollout policy in this family (either deterministic or randomized) is guaranteed to evaluate the game tree correctly with a finite number of rollouts. Moreover, we identify simple rollout policies in this family that ``implement'' alpha-beta and MT-SSS*. Specifically, given any game tree, the rollout algorithms with these particular policies always visit the same set of leaf nodes in the same order with alpha-beta and MT-SSS*, respectively. Our results suggest that traditional pruning techniques and the recent Monte Carlo Tree Search algorithms, as two competing approaches for game tree evaluation, may be unified under the rollout paradigm.
Reusing Previously Found A* Paths for Fast Goal-Directed Navigation in Dynamic Terrain
Hernandez, Carlos (Universidad Católica de la Santísima Concepción) | Asin, Roberto (Universidad Católica de la Santísima Concepción) | Baier, Jorge A (Pontificia Universidad Catolica de Chile)
Generalized Adaptive A* (GAA*) is an incremental algorithm that replans using A* when solving goal-directed navigation problems in dynamic terrain. Immediately after each A* search, it runs an efficient procedure that updates the heuristic values of states that were just expanded by A*, making them more informed. Those updates allow GAA* to speed up subsequent A* searches. Being based on A*, it is simple to describe and communicate; however, it is outperformed by other incremental algorithms like the state-of-the-art D*Lite algorithm at goal-directed navigation. In this paper we show how GAA* can be modified to exploit more information from a previous search in addition to the updated heuristic function. Specifically, we show how GAA* can be modified to utilize the paths found by a previous A* search. Our algorithm — Multipath Generalized Adaptive A* (MPGAA*) — has the same theoretical properties of GAA* and differs from it by only a few lines of pseudocode. Arguably, MPGAA* is simpler to understand than D*Lite. We evaluate MPGAA* over various realistic dynamic terrain settings, and observed that it generally outperforms the state-of-the-art algorithm D*Lite in scenarios resembling outdoor and indoor navigation.
Bayesian Affect Control Theory of Self
Hoey, Jesse (University of Waterloo) | Schroeder, Tobias (Potsdam University of Applied Sciences)
Notions of identity and of the self have long been studied in social psychology and sociology as key guiding elements of social interaction and coordination. In the AI of the future, these notions will also play a role in producing natural, socially appropriate artificially intelligent agents that encompass subtle and complex human social and affective skills. We propose here a Bayesian generalization of the sociological affect control theory of self as a theoretical foundation for socio-affectively skilled artificial agents. This theory posits that each human maintains an internal model of his or her deep sense of "self" that captures their emotional, psychological, and socio-cultural sense of being in the world. The "self" is then externalised as an identity within any given interpersonal and institutional situation, and this situational identity is the person's local (in space and time) representation of the self. Situational identities govern the actions of humans according to affect control theory. Humans will seek situations that allow them to enact identities consistent with their sense of self. This consistency is cumulative over time: if some parts of a person's self are not actualized regularly, the person will have a growing feeling of inauthenticity that they will seek to resolve. In our present generalisation, the self is represented as a probability distribution, allowing it to be multi-modal (a person can maintain multiple different identities), uncertain (a person can be unsure about who they really are), and learnable (agents can learn the identities and selves of other agents). We show how the Bayesian affect control theory of self can underpin artificial agents that are socially intelligent.