Goto

Collaborating Authors

 Geffner, Hector


Learning First-Order Representations for Planning from Black-Box States: New Results

arXiv.org Artificial Intelligence

Recently Bonet and Geffner have shown that first-order representations for planning domains can be learned from the structure of the state space without any prior knowledge about the action schemas or domain predicates. For this, the learning problem is formulated as the search for a simplest first-order domain description D that along with information about instances I_i (number of objects and initial state) determine state space graphs G(P_i) that match the observed state graphs G_i where P_i = (D, I_i). The search is cast and solved approximately by means of a SAT solver that is called over a large family of propositional theories that differ just in the parameters encoding the possible number of action schemas and domain predicates, their arities, and the number of objects. In this work, we push the limits of these learners by moving to an answer set programming (ASP) encoding using the CLINGO system. The new encodings are more transparent and concise, extending the range of possible models while facilitating their exploration. We show that the domains introduced by Bonet and Geffner can be solved more efficiently in the new approach, often optimally, and furthermore, that the approach can be easily extended to handle partial information about the state graphs as well as noise that prevents some states from being distinguished.


Expressing and Exploiting the Common Subgoal Structure of Classical Planning Domains Using Sketches: Extended Version

arXiv.org Artificial Intelligence

Width-based planning methods exploit the use of conjunctive goals for decomposing problems into subproblems of low width. However, algorithms like SIW fail when the goal is not serializable. In this work, we address this limitation of SIW by using a simple but powerful language for expressing problem decompositions introduced recently by Bonet and Geffner, called policy sketches. A policy sketch R consists of a set of Boolean and numerical features and a set of sketch rules that express how the values of these features are supposed to change. Like general policies, policy sketches are domain general, but unlike policies, the changes captured by sketch rules do not need to be achieved in a single step. We show that many planning domains that cannot be solved by SIW are provably solvable in low polynomial time with the SIW_R algorithm, the version of SIW that employs user-provided policy sketches. Policy sketches are thus shown to be a powerful language for expressing domain-specific knowledge in a simple and compact way and a convenient alternative to languages such as HTNs or temporal logics. Furthermore, policy sketches make it easy to express general problem decompositions and prove key properties like their complexity and width.


Flexible FOND Planning with Explicit Fairness Assumptions

arXiv.org Artificial Intelligence

We consider the problem of reaching a propositional goal condition in fully-observable non-deterministic (FOND) planning under a general class of fairness assumptions that are given explicitly. The fairness assumptions are of the form A/B and say that state trajectories that contain infinite occurrences of an action a from A in a state s and finite occurrence of actions from B, must also contain infinite occurrences of action a in s followed by each one of its possible outcomes. The infinite trajectories that violate this condition are deemed as unfair, and the solutions are policies for which all the fair trajectories reach a goal state. We show that strong and strong-cyclic FOND planning, as well as QNP planning, a planning model introduced recently for generalized planning, are all special cases of FOND planning with fairness assumptions of this form which can also be combined. FOND+ planning, as this form of planning is called, combines the syntax of FOND planning with some of the versatility of LTL for expressing fairness constraints. A new planner is implemented by reducing FOND+ planning to answer set programs, and the performance of the planner is evaluated in comparison with FOND and QNP planners, and LTL synthesis tools.


Learning General Policies from Small Examples Without Supervision

arXiv.org Artificial Intelligence

Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once. It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans, then the general policies are obtained from the learned QNP using a planner. In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner. The new formulation is very simple and can be cast in terms that are more standard in machine learning: a large but finite pool of features is defined from the predicates in the planning examples using a general grammar, and a small subset of features is sought for separating "good" from "bad" state transitions, and goals from non-goals. The problems of finding such a "separating surface" while labeling the transitions as "good" or "bad" are jointly addressed as a single combinatorial optimization problem expressed as a Weighted Max-SAT problem. The advantage of looking for the simplest policy in the given feature space that solves the given examples, possibly non-optimally, is that many domains have no general, compact policies that are optimal. The approach yields general policies for a number of benchmark domains.


General Policies, Serializations, and Planning Width

arXiv.org Artificial Intelligence

It has been observed that in many of the benchmark planning domains, atomic goals can be reached with a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width. Such problems have indeed a bounded width: a width that does not grow with the number of problem variables and is often no greater than two. Yet, while the notion of width has become part of the state-of-the-art planning algorithms like BFWS, there is still no good explanation for why so many benchmark domains have bounded width. In this work, we address this question by relating bounded width and serialized width to ideas of generalized planning, where general policies aim to solve multiple instances of a planning problem all at once. We show that bounded width is a property of planning domains that admit optimal general policies in terms of features that are explicitly or implicitly represented in the domain encoding. The results are extended to much larger class of domains with bounded serialized width where the general policies do not have to be optimal. The study leads also to a new simple, meaningful, and expressive language for specifying domain serializations in the form of policy sketches which can be used for encoding domain control knowledge by hand or for learning it from traces. The use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.


Factored Probabilistic Belief Tracking

arXiv.org Artificial Intelligence

The problem of belief tracking in the presence of stochastic actions and observations is pervasive and yet computationally intractable. In this work we show however that probabilistic beliefs can be maintained in factored form exactly and efficiently across a number of causally closed beams, when the state variables that appear in more than one beam obey a form of backward determinism . Since computing marginals from the factors is still computationally intractable in general, and variables appearing in several beams are not always backward-deterministic, the basic formulation is extended with two approximations: forms of belief propagation for computing marginals from factors, and sampling of non-backward-deterministic variables for making such variables backward-deterministic given their sampled history. Unlike, Rao-Blackwellized particle-filtering, the sampling is not used for making inference tractable but for making the factorization sound . The resulting algorithm involves sampling and belief propagation or just one of them as determined by the structure of the model.


Generalized Planning: Non-Deterministic Abstractions and Trajectory Constraints

arXiv.org Artificial Intelligence

We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies $\mu$ that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that $\mu$ terminates in Q. In this work, we shed light on why this termination condition is needed and how it can be removed. The key observation is that the abstract problem P captures the common structure among the concrete problems Q that is local (Markovian) but misses common structure that is global. We show how such global structure can be captured by means of trajectory constraints that in many cases can be expressed as LTL formulas, thus reducing generalized planning to LTL synthesis. Moreover, for a broad class of problems that involve integer variables that can be increased or decreased, trajectory constraints can be compiled away, reducing generalized planning to fully observable non-deterministic planning.


Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation

arXiv.org Artificial Intelligence

Belief tracking is a basic problem in planning with sensing. While the problem is intractable, it has been recently shown that for both deterministic and non-deterministic systems expressed in compact form, it can be done in time and space that are exponential in the problem width. The width measures the maximum number of state variables that are all relevant to a given precondition or goal. In this work, we extend this result both theoretically and practically. First, we introduce an alternative decomposition scheme and algorithm with the same time complexity but different completeness guarantees, whose space complexity is much smaller: exponential in the causal width of the problem that measures the number of state variables that are causally relevant to a given precondition, goal, or observable. Second, we introduce a fast, meaningful, and powerful approximation that trades completeness by speed, and is both time and space exponential in the problem causal width . It is then shown empirically that the algorithm combined with simple heuristics yields state-of-the-art real-time performance in domains with high widths but low causal widths such as Minesweeper, Battleship, and Wumpus.


Action Selection for MDPs: Anytime AO* vs. UCT

arXiv.org Artificial Intelligence

In the presence of non-admissible heuristics, A* and other best-first algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for best-first algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO* must thus address an exploration-exploitation tradeoff: they cannot just "exploit", they must keep exploring as well. In this work, we develop one such variant of AO* and apply it to finite-horizon MDPs. This Anytime AO* algorithm eventually delivers an optimal policy while using non-admissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO* for action selection over large infinite-horizon MDPs that cannot be solved with existing off-line heuristic search and dynamic programming algorithms, and compare it with UCT. Introduction One of the natural approaches for selecting actions in very large state spaces is by performing a limited amount of lookahead. In the contexts of discounted MDPs, Kearns, Mansour, and Ng have shown that near to optimal actions can be selected by considering a sampled lookahead tree that is sufficiently sparse, whose size depends on the discount factor and the suboptimality bound but not on the number of problem states (Kearns, Mansour, and Ng 1999).


Learning First-Order Symbolic Planning Representations from Plain Graphs

arXiv.org Artificial Intelligence

One of the main obstacles for developing flexible AI system is the split between data-based learners and model-based solvers. Solvers such as classical planners are very flexible and can deal with a variety of problem instances and goals but require first-order symbolic models. Data-based learners, on the other hand, are robust but do not produce such representations. In this work we address this split by showing how the first-order symbolic representations that are used by planners can be learned from non-symbolic representations alone given by a number of observed system trajectories organized as graphs. The observations can be arbitrary, including raw images. What it is required is that two observations are different iff they proceed from different states. The representation learning problem is formulated as the problem of inferring the simplest planning instances over a common first-order domain that can generate the structures of the observed graphs. A slightly richer version of the problem is also considered where actions are also observed and the graphs are labeled. The problem is expressed and solved via a SAT formulation that is shown to produce first-order representations for domains like Gripper, Blocks, and Hanoi. The work suggests that the target symbolic representations for planning encode the structure of the observed state space, not the observations themselves, as assumed in deep learning approaches.