Keller, Thomas
Trial-Based Heuristic Tree Search for MDPs with Factored Action Spaces
Geißer, Florian (Australian National University) | Speck, David (University of Freiburg) | Keller, Thomas (University of Basel)
MDPs with factored action spaces, i.e., where actions are described as assignments to a set of action variables, allow reasoning over action variables instead of action states, yet most algorithms only consider a grounded action representation. This includes algorithms that are instantiations of the Trial-based Heuristic Tree Search (THTS) framework, such as AO* or UCT. To be able to reason over factored action spaces, we propose a generalization of THTS where nodes that branch over all applicable actions are replaced with subtrees that consist of nodes that represent the decision for a single action variable. We show that many THTS algorithms retain their theoretical properties under the generalised framework, and show how to approximate any state-action heuristic to a heuristic for partial action assignments. This allows to guide a UCT variant that is able to create exponentially fewer nodes than the same algorithm that considers ground actions. An empirical evaluation on the benchmark set of the probabilistic track of the latest International Planning Competition validates the benefits of the approach.
Narrowing the Gap Between Saturated and Optimal Cost Partitioning for Classical Planning
Seipp, Jendrik (University of Basel) | Keller, Thomas (University of Basel) | Helmert, Malte (University of Basel)
In classical planning, cost partitioning is a method for admissibly combining a set of heuristic estimators by distributing operator costs among the heuristics. An optimal cost partitioning is often prohibitively expensive to compute. Saturated cost partitioning is an alternative that is much faster to compute and has been shown to offer high-quality heuristic guidance on Cartesian abstractions. However, its greedy nature makes it highly susceptible to the order in which the heuristics are considered. We show that searching in the space of orders leads to significantly better heuristic estimates than with previously considered orders. Moreover, using multiple orders leads to a heuristic that is significantly better informed than any single-order heuristic. In experiments with Cartesian abstractions, the resulting heuristic approximates the optimal cost partitioning very closely.
Delete Relaxations for Planning with State-Dependent Action Costs
Geißer, Florian (University of Freiburg) | Keller, Thomas (University of Freiburg) | Mattmüller, Robert (University of Freiburg)
Most work in planning focuses on tasks with state-independent or even uniform action costs. However, supporting state-dependent action costs admits a more compact representation of many tasks. We investigate how to solve such tasks using heuristic search, with a focus on delete-relaxation heuristics. We first define a generalization of the additive heuristic to such tasks and then discuss different ways of computing it via compilations to tasks with state-independent action costs and more directly by modifying the relaxed planning graph. We evaluate these approaches theoretically and present an implementation of the additive heuristic for planning with state-dependent action costs. To our knowledge, this gives rise to the first approach able to handle even the hardest instances of the combinatorial Academic Advising domain from the International Probabilistic Planning Competition (IPPC) 2014.
Better Be Lucky than Good: Exceeding Expectations in MDP Evaluation
Keller, Thomas (University of Freiburg) | Geißer, Florian (University of Freiburg)
Two other algorithms require the knowledge Markov Decision Processes (MDPs) offer a general framework of the optimal policy and its expected reward. We show to describe probabilistic planning problems of varying that the expected reward of the optimal policy is a lower complexity. The development of algorithms that act successfully bound for the expected performance of both strategies. in MDPs is important to many AI applications. Our final algorithm switches between the application of Since it is often impossible or intractable to evaluate MDP the optimal policy and the policy with the highest possible algorithms based on a theoretical analysis alone, the International outcome, which can be computed without notable overhead Probabilistic Planning Competition (IPPC) was introduced in the Trial-based Heuristic Tree Search (THTS) framework to allow a comparison based on experimental evaluation. (Keller and Helmert 2013). We show theoretically and empirically The idea is to approximate the quality of an MDP that all algorithms outperform the naïve base approach solver by performing a sequence of runs on a problem instance, that ignores the potential of optimizing evaluation and by using the average of the obtained results as runs in hindsight, and that it pays off to take suboptimal base an approximation of the expected reward.
Trial-Based Heuristic Tree Search for Finite Horizon MDPs
Keller, Thomas (University of Freiburg) | Helmert, Malte (University of Basel)
Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the search tree. The Heuristic Search algorithm AO∗ finds optimal solutions for MDPs that can be represented as acyclic AND/OR graphs. We introduce a common framework, Trial-based Heuristic Tree Search, that subsumes these approaches and distinguishes them based on five ingredients: heuristic function, backup function, action selection, outcome selection, and trial length. Using this framework, we describe three new algorithms which mix these ingredients in novel ways in an attempt to combine their different strengths. Our evaluation shows that two of our algorithms not only provide superior theoretical properties to UCT, but also outperform state-of-the-art approaches experimentally.
A Planning Based Framework for Controlling Hybrid Systems
Löhr, Johannes (University of Freiburg) | Eyerich, Patrick (University of Freiburg) | Keller, Thomas (University of Freiburg) | Nebel, Bernhard (University of Freiburg)
The control of dynamic systems, which aims to minimize the deviation of state variables from reference values in a continuous state space, is a central domain of cybernetics and control theory. The objective of action planning is to find feasible state trajectories in a discrete state space from an initial state to a state satisfying the goal conditions, which in principle addresses the same issue on a more abstract level. We combine these approaches to switch between dynamic system characteristics on the fly, and to generate control input sequences that affect both discrete and continuous state variables. Our approach (called Domain Predictive Control) is applicable to hybrid systems with linear dynamics and discretizable inputs.
High-Quality Policies for the Canadian Traveler's Problem
Eyerich, Patrick (Albert-Ludwigs-Universität Freiburg) | Keller, Thomas (Albert-Ludwigs-Universität Freiburg) | Helmert, Malte (Albert-Ludwigs-Universität Freiburg)
We consider the stochastic variant of the Canadian Traveler's Problem, a path planning problem where adverse weather can cause some roads to be untraversable. The agent does not initially know which roads can be used. However, it knows a probability distribution for the weather, and it can observe the status of roads incident to its location. The objective is to find a policy with low expected travel cost. We introduce and compare several algorithms for the stochastic CTP. Unlike the optimistic approach most commonly considered in the literature, the new approaches we propose take uncertainty into account explicitly. We show that this property enables them to generate policies of much higher quality than the optimistic one, both theoretically and experimentally.
Coming Up With Good Excuses: What to do When no Plan Can be Found
Göbelbecker, Moritz (Albert-Ludwigs-University Freiburg) | Keller, Thomas (Albert-Ludwigs-University Freiburg) | Eyerich, Patrick (Albert-Ludwigs-University Freiburg) | Brenner, Michael (Albert-Ludwigs-University Freiburg) | Nebel, Bernhard (Albert-Ludwigs-University Freiburg)
When using a planner-based agent architecture, many things can go wrong. First and foremost, an agent might fail to execute one of the planned actions for some reasons. Even more annoying, however, is a situation where the agent is incompetent, i.e., unable to come up with a plan. This might be due to the fact that there are principal reasons that prohibit a successful plan or simply because the task's description is incomplete or incorrect. In either case, an explanation for such a failure would be very helpful. We will address this problem and provide a formalization of coming up with excuses for not being able to find a plan. Based on that, we will present an algorithm that is able to find excuses and demonstrate that such excuses can be found in practical settings in reasonable time.