Goto

Collaborating Authors

 Planning & Scheduling


Iterative Learning of Weighted Rule Sets for Greedy Search

AAAI Conferences

Greedy search is commonly used in an attempt to generate solutions quickly at the expense of completeness and optimality. In this work, we consider learning sets of weighted action-selection rules for guiding greedy search with application to automated planning. We make two primary contributions over prior work on learning for greedy search. First, we introduce weighted sets of action-selection rules as a new form of control knowledge for greedy search. Prior work has shown the utility of action-selection rules for greedy search, but has treated the rules as hard constraints, resulting in brittleness. Our weighted rule sets allow multiple rules to vote, helping to improve robustness to noisy rules. Second, we give a new iterative learning algorithm for learning weighted rule sets based on RankBoost, an efficient boosting algorithm for ranking. Each iteration considers the actual performance of the current rule set and directs learning based on the observed search errors. This is in contrast to most prior approaches, which learn control knowledge independently of the search process. Our empirical results have shown significant promise for this approach in a number of domains.


Choosing Path Replanning Strategies for Unmanned Aircraft Systems

AAAI Conferences

Unmanned aircraft systems use a variety of techniques to plan collision-free flight paths given a map of obstacles and no-fly zones. However, maps are not perfect and obstacles may change over time or be detected during flight, which may invalidate paths that the aircraft is already following. Thus, dynamic in-flight replanning is required. Numerous strategies can be used for replanning, where the time requirements and the plan quality associated with each strategy depend on the environment around the original flight path. In this paper, we investigate the use of machine learning techniques, in particular support vector machines, to choose the best possible replanning strategy depending on the amount of time available. The system has been implemented, integrated and tested in hardware-in-the-loop simulation with a Yamaha RMAX helicopter platform.


Computing Applicability Conditions for Plans with Loops

AAAI Conferences

The utility of including loops in plans has been long recognized by the planning community. Loops in a plan help increase both its applicability and the compactness of representation. However, progress in finding such plans has been limited largely due to lack of methods for reasoning about the correctness and safety properties of loops of actions. We present novel algorithms for determining the applicability and progress made by a general class of loops of actions. These methods can be used for directing the search for plans with loops towards greater applicability while guaranteeing termination, as well as in post-processing of computed plans to precisely characterize their applicability. Experimental results demonstrate the efficiency of these algorithms.


Constraint Propagation in Propositional Planning

AAAI Conferences

Planning as Satisfiability is a most successful approach to optimal propositional planning. It draws its strength from the efficiency of state-of-the-art propositional satisfiability solvers, combined with the utilization of constraints that are inferred from the problem planning graph. One of the recent improvements of the framework is the addition of long-distance mutual exclusion (londex) constraints that relate facts and actions which refer to different time steps. In this paper we compare different encodings of planning as satisfiability wrt the constraint propagation they achieve in a modern SAT solver. This analysis explains some of the differences observed in the performance of different encodings, and leads to some interesting conclusions. For instance, the Blackbox encoding achieves more propagation than the one of Satplan06, and therefore is a stronger formulation of planning as satisfiability. Moreover, our investigation suggests a new more compact and stronger model for the problem. We prove that in this new formulation many of the londex constraints are redundant in the sense that they do not add anything to the constraint propagation achieved by the model. Experimental results suggest that the theoretical results obtained are practically relevant.


Handling Goal Utility Dependencies in a Satisfiability Framework

AAAI Conferences

Goal utility dependencies arise when the utility of achieving a goal depends on the other goals that are achieved with it. This complicates the planning procedure because achieving a new goal can potentially alter the utilities of all the other goals currently achieved. In this paper, we present an encoding procedure that enables general-purpose Max-SAT solvers to be used to solve planning problems with goal utility dependencies. We compare this approach to one using integer programming via an empirical evaluation using benchmark problems from past international planning competitions. Our results indicate that this approach is competitive and sometimes more successful than an integer programming one -- solving two to three times more subproblems in some domains, while being outperformed by only a significantly smaller margin in others.


The Joy of Forgetting: Faster Anytime Search via Restarting

AAAI Conferences

Anytime search algorithms solve optimisation problems by quickly finding a (usually suboptimal) first solution and then finding improved solutions when given additional time. To deliver an initial solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate h. In this paper, we show that this low-h bias can cause poor performance if the greedy search makes early mistakes. Building on this observation, we present a new anytime approach that restarts the search from the initial state every time a new solution is found. We demonstrate the utility of our method via experiments in PDDL planning as well as other domains, and show that it is particularly useful for problems where the heuristic has systematic errors.


Pattern Database Heuristics for Fully Observable Nondeterministic Planning

AAAI Conferences

When planning in an uncertain environment, one is often interested in finding a contingent plan that prescribes appropriate actions for all possible states that may be encountered during the execution of the plan. We consider the problem of finding strong cyclic plans for fully observable nondeterministic (FOND) planning problems. The algorithm we choose is LAO*, an informed explicit state search algorithm. We investigate the use of pattern database (PDB) heuristics to guide LAO* towards goal states. To obtain a fully domain-independent planning system, we use an automatic pattern selection procedure that performs local search in the space of pattern collections. The evaluation of our system on the FOND benchmarks of the Uncertainty Part of the International Planning Competition 2008 shows that our approach is competitive with symbolic regression search in terms of problem coverage, speed, and plan quality.


Classical Planning in MDP Heuristics: with a Little Help from Generalization

AAAI Conferences

Computing a good policy in stochastic uncertain environments with unknown dynamics and reward model parameters is a challenging task. In a number of domains, ranging from space robotics to epilepsy management, it may be possible to have an initial training period when suboptimal performance is permitted. For such problems it is important to be able to identify when this training period is complete, and the computed policy can be used with high confidence in its future performance. A simple principled criteria for identifying when training has completed is when the error bounds on the value estimates of the current policy are sufficiently small that the optimal policy is fixed, with high probability. We present an upper bound on the amount of training data required to identify the optimal policy as a function of the unknown separation gap between the optimal and the next-best policy values. We illustrate with several small problems that by estimating this gap in an online manner, the number of training samples to provably reach optimality can be significantly lower than predicted offline using a Probably Approximately Correct framework that requires an input epsilon parameter.


Self-Taught Decision Theoretic Planning with First Order Decision Diagrams

AAAI Conferences

We present a new paradigm for planning by learning, where the planner is given a model of the world and  a small set of states of interest, but no indication of optimal actions in these states. The additional information can help focus the planner on regions of the state space that are of interest and lead to improved performance. We demonstrate this idea by introducing novel model-checking reduction operations for First Order Decision Diagrams (FODD), a representation that has been used to implement decision-theoretic planning with Relational Markov Decision Processes (RMDP). Intuitively, these reductions modify the construction of the value function by removing any complex specifications that are irrelevant to the set of training examples, thereby focusing on the region of interest. We show that such training examples can be constructed on the fly from a description of the planning problem thus we can bootstrap to get a self-taught planning system. Additionally, we provide a new heuristic to embed universal and conjunctive goals within the framework of RMDP planners, expanding the scope and applicability of such systems. We show that these ideas lead to significant improvements in performance in terms of both speed and coverage of the planner, yielding state of the art planning performance on problems from the International Planning Competition.


Coming Up With Good Excuses: What to do When no Plan Can be Found

AAAI Conferences

When using a planner-based agent architecture, many things can go wrong. First and foremost, an agent might fail to execute one of the planned actions for some reasons. Even more annoying, however, is a situation where the agent is incompetent, i.e., unable to come up with a plan. This might be due to the fact that there are principal reasons that prohibit a successful plan or simply because the task's description is incomplete or incorrect. In either case, an explanation for such a failure would be very helpful. We will address this problem and provide a formalization of coming up with excuses for not being able to find a plan. Based on that, we will present an algorithm that is able to find excuses and demonstrate that such excuses can be found in practical settings in reasonable time.