Recent work has shown how to improve delete relaxation heuristics by computing relaxed plans, i.e., the hFF heuristic, in a compiled planning task PiC which represents a given set C of fact conjunctions explicitly. While this compilation view of such partial delete relaxation is simple and elegant, its meaning with respect to the original planning task is opaque, and the size of PiC grows exponentially in |C|. We herein provide a direct characterization, without compilation, making explicit how the approach arises from a combination of the delete-relaxation with critical-path heuristics. Designing equations characterizing a novel view on h+ on the one hand, and a generalized version hC of hm on the other hand, we show that h+(PiC) can be characterized in terms of a combined hcplus equation. This naturally generalizes the standard delete-relaxation framework: understanding that framework as a relaxation over singleton facts as atomic subgoals, one can refine the relaxation by using the conjunctions C as atomic subgoals instead. Thanks to this explicit view, we identify the precise source of complexity in hFF(PiC), namely maximization of sets of supported atomic subgoals during relaxed plan extraction, which is easy for singleton-fact subgoals but is NP-complete in the general case. Approximating that problem greedily, we obtain a polynomial-time hCFF version of hFF(PiC), superseding the PiC compilation, and superseding the modified PiCce compilation which achieves the same complexity reduction but at an information loss. Experiments on IPC benchmarks show that these theoretical advantages can translate into empirical ones.
Unavoidable dead-ends are common in many probabilistic planning problems, e.g. when actions may fail or when operating under resource constraints. An important objective in such settings is MaxProb, determining the maximal probability with which the goal can be reached, and a policy achieving that probability. Yet algorithms for MaxProb probabilistic planning are severely underexplored, to the extent that there is scant evidence of what the empirical state of the art actually is. We close this gap with a comprehensive empirical analysis. We design and explore a large space of heuristic search algorithms, systematizing known algorithms and contributing several new algorithm variants. We consider MaxProb, as well as weaker objectives that we baptize AtLeastProb (requiring to achieve a given goal probabilty threshold) and ApproxProb (requiring to compute the maximum goal probability up to a given accuracy). We explore both the general case where there may be 0-reward cycles, and the practically relevant special case of acyclic planning, such as planning with a limited action-cost budget. We design suitable termination criteria, search algorithm variants, dead-end pruning methods using classical planning heuristics, and node selection strategies. We design a benchmark suite comprising more than 1000 instances adapted from the IPPC, resource-constrained planning, and simulated penetration testing. Our evaluation clarifies the state of the art, characterizes the behavior of a wide range of heuristic search algorithms, and demonstrates significant benefits of our new algorithm variants.
Enforced hill-climbing is an effective deterministic hill-climbing technique that deals with local optima using breadth-first search (a process called ``basin flooding''). We propose and evaluate a stochastic generalization of enforced hill-climbing for online use in goal-oriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with flaws such as local optima and plateaus that thwart straightforward greedy action choice. While breadth-first search is effective in exploring basins around local optima in deterministic problems, for stochastic problems we dynamically build and solve a heuristic-based Markov decision process (MDP) model of the basin in order to find a good escape policy exiting the local optimum. We note that building this model involves integrating the heuristic into the MDP problem because the local goal is to improve the heuristic. We evaluate our proposal in twenty-four recent probabilistic planning-competition benchmark domains and twelve probabilistically interesting problems from recent literature. For evaluation, we show that stochastic enforced hill-climbing (SEH) produces better policies than greedy heuristic following for value/cost functions derived in two very different ways: one type derived by using deterministic heuristics on a deterministic relaxation and a second type derived by automatic learning of Bellman-error features from domain-specific experience. Using the first type of heuristic, SEH is shown to generally outperform all planners from the first three international probabilistic planning competitions.
Markov decision processes are of major interest in the planning community as well as in the model checking community. But in spite of the similarity in the considered formal models, the development of new techniques and methods happened largely independently in both communities. This work is intended as a beginning to unite the two research branches. We consider goal-reachability analysis as a common basis between both communities. The core of this paper is the translation from Jani, an overarching input language for quantitative model checkers, into the probabilistic planning domain definition language (PPDDL), and vice versa from PPDDL into Jani. These translations allow the creation of an overarching benchmark collection, including existing case studies from the model checking community, as well as benchmarks from the international probabilistic planning competitions (IPPC). We use this benchmark set as a basis for an extensive empirical comparison of various approaches from the model checking community, variants of value iteration, and MDP heuristic search algorithms developed by the AI planning community. On a per benchmark domain basis, techniques from one community can achieve state-ofthe-art performance in benchmarks of the other community. Across all benchmark domains of one community, the performance comparison is however in favor of the solvers and algorithms of that particular community. Reasons are the design of the benchmarks, as well as tool-related limitations. Our translation methods and benchmark collection foster crossfertilization between both communities, pointing out specific opportunities for widening the scope of solvers to different kinds of models, as well as for exchanging and adopting algorithms across communities.
A fundamental concern in real-time planning is the presence of dead-ends in the state space, from which no goal is reachable. Recently, the SafeRTS algorithm was proposed for searching in such spaces. SafeRTS exploits a user-provided predicate to identify safe states, from which a goal is likely reachable, and attempts to maintain a backup plan for reaching a safe state at all times. In this paper, we study the SafeRTS approach, identify certain properties of its behavior, and design an improved framework for safe real-time search. We prove that the new approach performs at least as well as SafeRTS and present experimental results showing that its promise is fulfilled in practice.