heuristic estimate
Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search
Hadar, Gal, Agostinelli, Forest, Shperberg, Shahaf S.
Many sequential decision-making problems can be formulated as shortest-path problems, where the objective is to reach a goal state from a given starting state. Heuristic search is a standard approach for solving such problems, relying on a heuristic function to estimate the cost to the goal from any given state. Recent approaches leverage reinforcement learning to learn heuristics by applying deep approximate value iteration. These methods typically rely on single-step Bellman updates, where the heuristic of a state is updated based on its best neighbor and the corresponding edge cost. This work proposes a generalized approach that enhances both state sampling and heuristic updates by performing limited-horizon searches and updating each state's heuristic based on the shortest path to the search frontier, incorporating both edge costs and the heuristic values of frontier states.
A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees
Hoerger, Marcus, Kurniawati, Hanna, Kroese, Dirk, Ye, Nan
The Partially Observable Markov Decision Process (POMDP) provides a principled framework for decision making in stochastic partially observable environments. However, computing good solutions for problems with continuous action spaces remains challenging. To ease this challenge, we propose a simple online POMDP solver, called Lazy Cross-Entropy Search Over Policy Trees (LCEOPT). At each planning step, our method uses a novel lazy Cross-Entropy method to search the space of policy trees, which provide a simple policy representation. Specifically, we maintain a distribution on promising finite-horizon policy trees. The distribution is iteratively updated by sampling policies, evaluating them via Monte Carlo simulation, and refitting them to the top-performing ones. Our method is lazy in the sense that it exploits the policy tree representation to avoid redundant computations in policy sampling, evaluation, and distribution update. This leads to computational savings of up to two orders of magnitude. Our LCEOPT is surprisingly simple as compared to existing state-of-the-art methods, yet empirically outperforms them on several continuous-action POMDP problems, particularly for problems with higher-dimensional action spaces.
Formalizing the presumption of independence
Christiano, Paul, Neyman, Eric, Xu, Mark
Mathematical proof aims to deliver confident conclusions, but a very similar process of deduction can be used to make uncertain estimates that are open to revision. A key ingredient in such reasoning is the use of a "default" estimate of $\mathbb{E}[XY] = \mathbb{E}[X] \mathbb{E}[Y]$ in the absence of any specific information about the correlation between $X$ and $Y$, which we call *the presumption of independence*. Reasoning based on this heuristic is commonplace, intuitively compelling, and often quite successful -- but completely informal. In this paper we introduce the concept of a heuristic estimator as a potential formalization of this type of defeasible reasoning. We introduce a set of intuitively desirable coherence properties for heuristic estimators that are not satisfied by any existing candidates. Then we present our main open problem: is there a heuristic estimator that formalizes intuitively valid applications of the presumption of independence without also accepting spurious arguments?
Saturated Cost Partitioning for Optimal Classical Planning
Seipp, Jendrik (University of Basel) | Keller, Thomas (University of Basel) | Helmert, Malte (University of Basel)
Cost partitioning is a method for admissibly combining a set of admissible heuristic estimators by distributing operator costs among the heuristics. Computing an optimal cost partitioning, i.e., the operator cost distribution that maximizes the heuristic value, is often prohibitively expensive to compute. Saturated cost partitioning is an alternative that is much faster to compute and has been shown to yield high-quality heuristics. However, its greedy nature makes it highly susceptible to the order in which the heuristics are considered. We propose a greedy algorithm to generate orders and show how to use hill-climbing search to optimize a given order. Combining both techniques leads to significantly better heuristic estimates than using the best random order that is generated in the same time. Since there is often no single order that gives good guidance on the whole state space, we use the maximum of multiple orders as a heuristic that is significantly better informed than any single-order heuristic, especially when we actively search for a set of diverse orders.
A Comparison of Cost Partitioning Algorithms for Optimal Classical Planning
Seipp, Jendrik (University of Basel) | Keller, Thomas (University of Basel) | Helmert, Malte (University of Basel)
Cost partitioning is a general and principled approach for constructing additive admissible heuristics for state-space search. Cost partitioning approaches for optimal classical planning include optimal cost partitioning, uniform cost partitioning, zero-one cost partitioning, saturated cost partitioning, post-hoc optimization and the canonical heuristic for pattern databases. We compare these algorithms theoretically, showing that saturated cost partitioning dominates greedy zero-one cost partitioning. As a side effect of our analysis, we obtain a new cost partitioning algorithm dominating uniform cost partitioning. We also evaluate these algorithms experimentally on pattern databases, Cartesian abstractions and landmark heuristics, showing that saturated cost partitioning is usually the method of choice on the IPC benchmark suite.
Adapting Novelty to Classical Planning as Heuristic Search
Katz, Michael (IBM Watson Health) | Lipovetzky, Nir (University of Melbourne) | Moshkovich, Dany (IBM Watson Health) | Tuisov, Alexander (The Technion-Israel Institute of Technology)
The introduction of the concept of state novelty has advanced the state of the art in deterministic online planning in Atari-like problems and in planning with rewards in general, when rewards are defined on states. In classical planning, however, the success of novelty as the dichotomy between novel and non-novel states was somewhat limited. Until very recently, novelty-based methods were not able to successfully compete with state-of-the-art heuristic search based planners. In this work we adapt the concept of novelty to heuristic search planning, defining the novelty of a state with respect to its heuristic estimate. We extend the dichotomy between novel and non-novel states and quantify the novelty degree of state facts. We then show a variety of heuristics based on the concept of novelty and exploit the recently introduced best-first width search for satisficing classical planning. Finally, we empirically show the resulting planners to significantly improve the state of the art in satisficing planning.
Incorporating Domain-Independent Planning Heuristics in Hierarchical Planning
Shivashankar, Vikas (Knexus Research Corporation) | Alford, Ron (MITRE) | Aha, David W. (Naval Research Laboratory)
Heuristics serve as a powerful tool in modern domain-independent planning (DIP) systems by providing critical guidance during the search for high-quality solutions. However, they have not been broadly used with hierarchical planning techniques, which are more expressive and tend to scale better in complex domains by exploiting additional domain-specific knowledge. Complicating matters, we show that for Hierarchical Goal Network (HGN) planning, a goal-based hierarchical planning formalism that we focus on in this paper, any poly-time heuristic that is derived from a delete-relaxation DIP heuristic has to make some relaxation of the hierarchical semantics. To address this, we present a principled framework for incorporating DIP heuristics into HGN planning using a simple relaxation of the HGN semantics we call Hierarchy-Relaxation. This framework allows for computing heuristic estimates of HGN problems using any DIP heuristic in an admissibility-preserving manner. We demonstrate the feasibility of this approach by using the LMCut heuristic to guide an optimal HGN planner. Our empirical results with three benchmark domains demonstrate that simultaneously leveraging hierarchical knowledge and heuristic guidance substantially improves planning performance.
Heuristics for Cost-Optimal Classical Planning Based on Linear Programming
Pommerening, Florian (Universitat Basel) | Roger, Gabriele (Universitat Basel) | Helmert, Malte (Universitat Basel) | Bonet, Blai (Universidad Simon Bolivar)
This model is used to automatically synthetise a controller that maps executions to the next action to perform. Many heuristics for cost-optimal planning are The problem is thus cast as a synthesis problem from a based on linear programming. We cover several given specification. Two obstacles for this approach are that interesting heuristics of this type by a common a suitable model for the task is needed, and that the synthesis framework that fixes the objective function of the problem is intractable in general. But, this intractability does linear program. Within the framework, constraints not preclude the approach from being effective in meaningful from different heuristics can be combined in one cases. Planning is the model-based approach to autonomous heuristic estimate which dominates the maximum behaviour.
Finding and Exploiting LTL Trajectory Constraints in Heuristic Search
Simon, Salomé (University of Basel) | Röger, Gabriele (University of Basel)
Temporal logics allow to formulate and reason about the development A unified formalism for these techniques would offer two of logic-based systems, for example about paths main advantages: decoupling the derivation and exploitation in factored state spaces. These are for instance common in of information and easily combining different sources planning, where temporal logics have always been present. of information. As one extreme, the entire planning task can be specified in a Currently the derivation and exploitation of information temporal logic language and plans are generated by theorem are integrated in most cases: someone proposes a new source proving (Koehler and Treinen 1995) or model construction of information and shows how it can correctly be exploited (Cerrito and Mayer 1998).
From Non-Negative to General Operator Cost Partitioning
Pommerening, Florian (University of Basel) | Helmert, Malte (University of Basel) | Röger, Gabriele (University of Basel) | Seipp, Jendrik (University of Basel)
Operator cost partitioning is a well-known technique to make admissible heuristics additive by distributing the operator costs among individual heuristics. Planning tasks are usually defined with non-negative operator costs and therefore it appears natural to demand the same for the distributed costs. We argue that this requirement is not necessary and demonstrate the benefit of using general cost partitioning. We show that LP heuristics for operator-counting constraints are cost-partitioned heuristics and that the state equation heuristic computes a cost partitioning over atomic projections. We also introduce a new family of potential heuristics and show their relationship to general cost partitioning.