We present a novel approach to learning heuristic functions for AI planning domains. Given a state, we view a relaxed plan (RP) found from that state as a relational database, which includes the current state and goal facts, the actions in the RP, and the actions' add and delete lists. We represent heuristic functions as linear combinations of generic features of the database, selecting features and weights using training data from solved problems in the target planning domain. Many recent competitive planners use RP-based heuristics, but focus exclusively on the length of the RP, ignoring other RP features. Since RP construction ignores delete lists, for many domains, RP length dramatically underestimates the distance to a goal, providing poor guidance. By using features that depend on deleted facts and other RP properties, our learned heuristics can potentially capture patterns that describe where such underestimation occurs. Experiments in the STRIPS domains of IPC 3 and 4 show that best-first search using the learned heuristic can outperform FF (Hoffmann & Nebel 2001), which provided our training data, and frequently outperforms the top performances in IPC 4.
The availability of informed (but inadmissible) planning heuristics has enabled the development of highly scalable planning systems. Due to this success, a body of work has grown around modifying these heuristics to handle extensions to classical planning. Most recently, there has been an interest in addressing partial satisfaction planning problems, but existing heuristics fail to address the complex interactions that occur in these problems between action and goal selection. In this paper we provide a unique admissible heuristic based on linear programming that we use to solve a relaxed version of the partial satisfaction planning problem. We incorporate this heuristic in conjunction with a lookahead strategy in a branch and bound algorithm to solve a class of oversubscribed planning problems.
We select policies for large Markov Decision Processes (MDPs) with compact first-order representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamic-programming approaches based on flat, propositional, or first-order representations either are impractical here or do not naturally scale as the number of objects grows without bound. We implement and evaluate an alternative approach that induces first-order policies using training data constructed by solving small problem instances using PGraphplan (Blum & Langford, 1999). Our policies are represented as ensembles of decision lists, using a taxonomic concept language. This approach extends the work of Martin and Geffner (2000) to stochastic domains, ensemble learning, and a wider variety of problems. Empirically, we find "good" policies for several stochastic first-order MDPs that are beyond the scope of previous approaches. We also discuss the application of this work to the relational reinforcement-learning problem.
We study an approach to learning heuristics for planning domains from example solutions. There has been little work on learning heuristics for the types of domains used in deterministic and stochastic planning competitions. Perhaps one reason for this is the challenge of providing a compact heuristic language that facilitates learning. Here we introduce a new representation for heuristics based on lists of set expressions described using taxonomic syntax. Next, we review the idea of a measure of progress (Parmar 2002), which is any heuristic that is guaranteed to be improvable at every state. We take finding a measure of progress as our learning goal, and describe a simple learning algorithm for this purpose. We evaluate our approach across a range of deterministic and stochastic planning-competition domains. The results show that often greedily following the learned heuristic is highly effective. We also show our heuristic can be combined with learned rule-based policies, producing still stronger results.
The current evaluation functions for heuristic planning are expensive to compute. In numerous domains these functions give good guidance on the solution, so it worths the computation effort. On the contrary, where this is not true, heuristics planners compute loads of useless node evaluations that make them scale-up poorly. In this paper we present a novel approach for boosting the scalability of heuristic planners based on automatically learning domain-specific search control knowledge in the form of relational decision trees. Particularly, we define the learning of planning search control as a standard classification process.