Lipovetzky, Nir (University of Melbourne) | Muise, Christian (Massachusetts Institute of Technology) | Geffner, Hector (Universitat Pompeu Fabra)

We consider the problem of deriving formulas that capture traps, invariants, and dead-ends in classical planning through polynomial forms of preprocessing. An invariant is a formula that is true in the initial state and in all reachable states. A trap is a conditional invariant: once a state is reached that makes the trap true, all the states that are reachable from it will sat- isfy the trap formula as well. Finally, dead-ends are formulas that are satisfied in states that make the goal unreachable. We introduce a preprocessing algorithm that computes traps in k- DNF form that is exponential in the k parameter, and show how the algorithm can be used to precompute invariants and dead-ends. We report also preliminary tests that illustrate the effectiveness of the preprocessing algorithm for identifying dead-end states, and compare it with the identification that follows from the use of the h1 and h2 heuristics that cannot be preprocessed, and must be computed at run time.

Bonet, Blai (Universidad Simon Bolivar) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)

We establish conditions under which memoryless policies and finite-state controllers that solve one partially observable non-deterministic problem (PONDP) generalize to other problems; namely, problems that have a similar structure and share the same action and observation space. This is relevant to generalized planning where plans that work for many problems are sought, and to transfer learning where knowledge gained in the solution of one problem is to be used on related problems. We use a logical setting where uncertainty is represented by sets of states and the goal is to be achieved with certainty. While this gives us crisp notions of solution policies and generalization, the account also applies to probabilistic PONDs, i.e., Goal POMDPs.

Bonet, Blai (Universidad Simon Bolivar) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)

Belief tracking is a basic problem in planning with sensing. While the problem is intractable, it has been recently shown that for both deterministic and non-deterministic systems expressed in compact form, it can be done in time and space that are exponential in the problem width. The width measures the maximum number of state variables that are all relevant to a given precondition or goal. In this work, we extend this result both theoretically and practically. First, we introduce an alternative decomposition scheme and algorithm with the same time complexity but different completeness guarantees, whose space complexity is much smaller: exponential in the causal width of the problem that measures the number of state variables that are causally relevant to a given precondition, goal, or observable. Second, we introduce a fast, meaningful, and powerful approximation that trades completeness by speed, and is both time and space exponential in the problem causal width. It is then shown empirically that the algorithm combined with simple heuristics yields state-of-the-art real-time performance in domains with high widths but low causal widths such as Minesweeper, Battleship, and Wumpus.

Patrizi, Fabio (Università di Roma "La Sapienza") | Lipovetzky, Nir (The University of Melbourne) | Geffner, Hector ( Institucion Catalana de Recerca i Estudis Avançats Universitat Pompeu Fabra )

We consider the problem of planning in environments where the state is fully observable, actions have non-deterministic effects, and plans must generate infinite state trajectories for achieving a large class of LTL goals. More formally, we focus on the control synthesis problem under the assumption that the LTL formula to be realized can be mapped into a deterministic Bu ̈chi automaton. We show that by assuming that action non-determinism is fair, namely that infinite executions of a non-deterministic action in the same state yield each possible successor state an infinite number of times, the (fair) synthesis problem can be reduced to a standard strong cyclic planning task over reachability goals. Since strong cyclic planners are built on top of efficient classical planners, the transformation reduces the non-deterministic, fully observable, temporally extended planning task into the solution of classical planning problems. A number of experiments are reported showing the potential benefits of this approach to synthesis in comparison with state-of-the-art symbolic methods.

Srivastava, Siddharth (University of Massachusetts, Amherst) | Zilberstein, Shlomo (University of Massachusetts, Amherst) | Immerman, Neil (University of Massachusetts, Amherst) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)

We consider a new class of planning problems involving a set of non-negative real variables, and a set of non-deterministic actions that increase or decrease the values of these variables by some arbitrary amount. The formulas specifying the initial state, goal state, or action preconditions can only assert whether certain variables are equal to zero or not. Assuming that the state of the variables is fully observable, we obtain two results. First, the solution to the problem can be expressed as a policy mapping qualitative states into actions, where a qualitative state includes a Boolean variable for each original variable, indicating whether its value is zero or not. Second, testing whether any such policy, that may express nested loops of actions, is a solution to the problem, can be determined in time that is polynomial in the qualitative state space, which is much smaller than the original infinite state space. We also report experimental results using a simple generate-and-test planner to illustrate these findings.

Ramírez, Miquel (Universitat Pompeu Fabra) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)

Plan recognition is the problem of inferring the goals and plans of an agent from partial observations of her behavior. Recently, it has been shown that the problem can be formulated and solved using planners, reducing plan recognition to plan generation. In this work, we extend this model-based approach to plan recognition to the POMDP setting, where actions are stochastic and states are partially observable. The task is to infer a probability distribution over the possible goals of an agent whose behavior results from a POMDP model. The POMDP model is shared between agent and observer except for the true goal of the agent that is hidden to the observer. The observations are action sequences O that may contain gaps as some or even most of the actions done by the agent may not be observed. We show that the posterior goal distribution P ( G | O ) can be computed from the value function V G ( b ) over beliefs b generated by the POMDP planner for each possible goal G. Some extensions of the basic framework are discussed, and a number of experiments are reported.

Patrizi, Fabio (Imperial College London) | Lipoveztky, Nir (Universitat Pompeu Fabra) | Giacomo, Giuseppe De (Sapienza Università) | Geffner, Hector (di Roma)

Classical planning has been notably successful in synthesizing finite plans to achieve states where propositional goals hold. In the last few years, classical planning has also been extended to incorporate temporally extended goals, expressed in temporal logics such as LTL, to impose restrictions on the state sequences generated by finite plans. In this work, we take the next step and consider the computation of infinite plans for achieving arbitrary LTL goals. We show that infinite plans can also be obtained efficiently by calling a classical planner once over a classical planning encoding that represents and extends the composition of the planning domain and the Buchi automaton representing the goal. This compilation scheme has been implemented and a number of experiments are reported.

Bonet, Blai (Universidad Simon Bolivar) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)

Planning with partial observability can be formulated as a non-deterministic search problem in belief space. The problem is harder than classical planning as keeping track of beliefs is harder than keeping track of states, and searching for action policies is harder than searching for action sequences. In this work, we develop a framework for partial observability that avoids these limitations and leads to a planner that scales up to larger problems. For this, the class of problems is restricted to those in which 1) the non-unary clauses representing the uncertainty about the initial situation are nvariant, and 2) variables that are hidden in the initial situation do not appear in the body of conditional effects, which are all assumed to be deterministic. We show that such problems can be translated in linear time into equivalent fully observable non-deterministic planning problems, and that an slight extension of this translation renders the problem solvable by means of classical planners. The whole approach is sound and complete provided that in addition, the state-space is connected. Experiments are also reported.

Lipovetzky, Nir (Universitat Pompeu Fabra) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)

Current state-of-the-art planners solve problems, easy and hard alike, by search, expanding hundreds or thousands of nodes. Yet, given the ability of people to solve easy problems and to explain their solutions, it seems that an essential inferential component may be missing. The reasons expressed by people for selecting actions appear to be related to causal chains: sequences of causal links a i → p i + 1 , i = 0, ..., n – 1, such that a 0 is applicable in the current state, p i is a precondition of action a i , and p n is a goal. Some of these causal chains or paths appear to be good, some bad, others appear to be impossible. In this work, we focus on such paths and develop three techniques for performing inference over them from which a path-based planner is obtained. We define the conditions under which a path is consistent, provide an heuristic estimate of the cost of achieving the goal along a consistent path, and introduce a planning algorithm that uses paths as decomposition backbones. The resulting planner, called C3, is not complete and does not perform as well as recent planners that carry extensive but extremely efficient searches such as LAMA, but is competitive with FF and in particular, with FF running in EHC mode which yields very focused but incomplete searches, and thus provides, a more apt comparison. Moreover, many domains are solved backtrack-free, with no search at all, suggesting that planning with paths may be a meaningful idea both cognitively and computationally.

Bonet, Blai (Universidad Simon Bolivar) | Geffner, Hector (ICREA &)

Point-based algorithms and RTDP-Bel are approximate methods for solving POMDPs that replace the full updates of parallel value iteration by faster and more effective updates at selected beliefs. An important difference between the two methods is that the former adopt Sondik's representation of the value function, while the latter uses a tabular representation and a discretization function. The algorithms, however, have not been compared up to now, because they target different POMDPs: discounted POMDPs on the one hand, and Goal POMDPs on the other. In this paper, we bridge this representational gap, showing how to transform discounted POMDPs into Goal POMDPs, and use the transformation to compare RTDP-Bel with point-based algorithms over the existing discounted benchmarks. The results appear to contradict the conventional wisdom in the area showing that RTDP-Bel is competitive, and sometimes superior to point-based algorithms in both quality and time.