Search
Combined Task and Motion Planning for Mobile Manipulation
Wolfe, Jason (University of California, Berkeley) | Marthi, Bhaskara (Willow Garage, Inc) | Russell, Stuart (University of California, Berkeley)
We present a hierarchical planning system and its application to robotic manipulation. The novel features of the system are: 1) it finds high-quality kinematic solutions to task-level problems; 2) it takes advantage of subtask-specific irrelevance information, reusing optimal solutions to state-abstracted subproblems across the search space. We briefly describe how the system handles uncertainty during plan execution, and present results on discrete problems as well as pick-and-place tasks for a mobile robot.
The More, the Merrier: Combining Heuristic Estimators for Satisficing Planning
Röger, Gabriele (Albert-Ludwigs-Universität Freiburg) | Helmert, Malte (Albert-Ludwigs-Universität Freiburg)
We empirically examine several ways of exploiting the information of multiple heuristics in a satisficing best-first search algorithm, comparing their performance in terms of coverage, plan quality, speed, and search guidance. Our results indicate that using multiple heuristics for satisficing search is indeed useful. Among the combination methods we consider, the best results are obtained by the alternation method of the "Fast Diagonally Downward" planner.
On Adversarial Search Spaces and Sampling-Based Planning
Ramanujan, Raghuram (Cornell University) | Sabharwal, Ashish (Cornell University) | Selman, Bart (Cornell University)
Upper Confidence bounds applied to Trees (UCT), a bandit-based Monte-Carlo sampling algorithm for planning, has recently been the subject of great interest in adversarial reasoning. UCT has been shown to outperform traditional minimax based approaches in several challenging domains such as Go and Kriegspiel, although minimax search still prevails in other domains such as Chess. This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed.
Waking Up a Sleeping Rabbit: On Natural-Language Sentence Generation with FF
Koller, Alexander (Saarland University) | Hoffmann, Joerg (INRIA)
We present a planning domain that encodes the problem of generating natural language sentences. This domain has a number of features that provoke fairly unusual behavior in planners. In particular, hitherto no existing automated planner was sufficiently effective to be of practical value in this application. We analyze in detail the reasons for ineffectiveness in FF, resulting in a few minor implementation fixes in FF's preprocessor, and in a basic reconfiguration of its search options. The performance of the modified FF is up to several orders of magnitude better than that of the original FF, and for the first time makes automated planners a practical possibility for this application. Beside thus highlighting the importance of preprocessing and automated configuration techniques, we show that the domain still poses several interesting challenges to the development of search heuristics.
Iterative Learning of Weighted Rule Sets for Greedy Search
Xu, Yuehua (Oregon State University) | Fern, Alan (Oregon State University) | Yoon, Sungwook (Palo Alto Research Center)
Greedy search is commonly used in an attempt to generate solutions quickly at the expense of completeness and optimality. In this work, we consider learning sets of weighted action-selection rules for guiding greedy search with application to automated planning. We make two primary contributions over prior work on learning for greedy search. First, we introduce weighted sets of action-selection rules as a new form of control knowledge for greedy search. Prior work has shown the utility of action-selection rules for greedy search, but has treated the rules as hard constraints, resulting in brittleness. Our weighted rule sets allow multiple rules to vote, helping to improve robustness to noisy rules. Second, we give a new iterative learning algorithm for learning weighted rule sets based on RankBoost, an efficient boosting algorithm for ranking. Each iteration considers the actual performance of the current rule set and directs learning based on the observed search errors. This is in contrast to most prior approaches, which learn control knowledge independently of the search process. Our empirical results have shown significant promise for this approach in a number of domains.
Simultaneously Searching with Multiple Settings: An Alternative to Parameter Tuning for Suboptimal Single-Agent Search Algorithms
Valenzano, Richard Anthony (University of Alberta) | Sturtevant, Nathan (University of Alberta) | Schaeffer, Jonathan (University of Alberta) | Buro, Karen (Grant MacEwan University) | Kishimoto, Akihiro (Tokyo Institute of Technology and Japan Science and Technology Agency)
Many search algorithms have parameters that need to be tuned to get the best performance. Typically, the parameters are tuned offline, resulting in a generic setting that is supposed to be effective on all problem instances. For suboptimal single-agent search, problem-instance-specific parameter settings can result in substantially reduced search effort. We consider the use of dovetailing as a way to take advantage of this fact. Dovetailing is a procedure that performs search with multiple parameter settings simultaneously. Dovetailing is shown to improve the search speed of weighted IDA* by several orders of magnitude and to generally enhance the performance of weighted RBFS. This procedure is trivially parallelizable and is shown to be an effective form of parallelization for WA* and BULB. In particular, using WA* with parallel dovetailing yields good speedups in the sliding-tile puzzle domain, and increases the number of problems solved when used in an automated planning system.
The Joy of Forgetting: Faster Anytime Search via Restarting
Richter, Silvia (Griffith University &) | Thayer, Jordan T. (NICTA) | Ruml, Wheeler (University of New Hampshire)
Anytime search algorithms solve optimisation problems by quickly finding a (usually suboptimal) first solution and then finding improved solutions when given additional time. To deliver an initial solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate h. In this paper, we show that this low-h bias can cause poor performance if the greedy search makes early mistakes. Building on this observation, we present a new anytime approach that restarts the search from the initial state every time a new solution is found. We demonstrate the utility of our method via experiments in PDDL planning as well as other domains, and show that it is particularly useful for problems where the heuristic has systematic errors.
Partially Informed Depth-First Search for the Job Shop Problem
Mencía, Carlos (University of Oviedo) | Sierra, María R. (University of Oviedo) | Varela, Ramiro (University of Oviedo)
We propose a partially informed depth-first search algorithm to cope with the Job Shop Scheduling Problem with makespan minimization. The algorithm is built from the well-known P. Brucker's branch and bound algorithm. We improved the heuristic estimation of Brucker's algorithm by means of constraint propagation rules and so devised a more informed heuristic which is proved to be monotonic. We conducted an experimental study across medium and large instances. The results show that the proposed algorithm reaches optimal solutions for medium instances taking less time than branch and bound and that for large instances it reaches much better lower and upper bounds when both algorithms are given the same amount of time.
Pattern Database Heuristics for Fully Observable Nondeterministic Planning
Mattmüller, Robert (University of Freiburg) | Ortlieb, Manuela (University of Freiburg) | Helmert, Malte (University of Freiburg) | Bercher, Pascal (University of Ulm)
When planning in an uncertain environment, one is often interested in finding a contingent plan that prescribes appropriate actions for all possible states that may be encountered during the execution of the plan. We consider the problem of finding strong cyclic plans for fully observable nondeterministic (FOND) planning problems. The algorithm we choose is LAO*, an informed explicit state search algorithm. We investigate the use of pattern database (PDB) heuristics to guide LAO* towards goal states. To obtain a fully domain-independent planning system, we use an automatic pattern selection procedure that performs local search in the space of pattern collections. The evaluation of our system on the FOND benchmarks of the Uncertainty Part of the International Planning Competition 2008 shows that our approach is competitive with symbolic regression search in terms of problem coverage, speed, and plan quality.
Classical Planning in MDP Heuristics: with a Little Help from Generalization
Kolobov, Andrey (University of Washington, Seattle) | Mausam, . (University of Washington, Seattle) | Weld, Daniel S. (University of Washington, Seattle)
Computing a good policy in stochastic uncertain environments with unknown dynamics and reward model parameters is a challenging task. In a number of domains, ranging from space robotics to epilepsy management, it may be possible to have an initial training period when suboptimal performance is permitted. For such problems it is important to be able to identify when this training period is complete, and the computed policy can be used with high confidence in its future performance. A simple principled criteria for identifying when training has completed is when the error bounds on the value estimates of the current policy are sufficiently small that the optimal policy is fixed, with high probability. We present an upper bound on the amount of training data required to identify the optimal policy as a function of the unknown separation gap between the optimal and the next-best policy values. We illustrate with several small problems that by estimating this gap in an online manner, the number of training samples to provably reach optimality can be significantly lower than predicted offline using a Probably Approximately Correct framework that requires an input epsilon parameter.