Geffner, Hector

Features, Projections, and Representation Change for Generalized Planning Artificial Intelligence

Generalized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a mapping from feature or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and objects change across instances. In this work, we extend the standard formulation of generalized planning to such domains. This is achieved by projecting the actions over the features, resulting in a common set of abstract actions which can be tested for soundness and completeness, and which can be used for generating general policies such as "if the gripper is empty, pick the clear block above x and place it on the table" that achieve the goal clear(x) in any Blocksworld instance. In this policy, "pick the clear block above x" is an abstract action that may represent the action Unstack(a, b) in one situation and the action Unstack(b, c) in another. Transformations are also introduced for computing such policies by means of fully observable non-deterministic (FOND) planners. The value of generalized representations for learning general policies is also discussed.

Planning With Pixels in (Almost) Real Time

AAAI Conferences

Recently, width-based planning methods have been shown to yield state-of-the-art results in the Atari 2600 video games. For this, the states were associated with the (RAM) memory states of the simulator. In this work, we consider the same planning problem but using the screen instead. By using the same visual inputs, the planning results can be compared with those of humans and learning methods. We show that the planning approach, out of the box and without training, results in scores that compare well with those obtained by humans and learning methods, and moreover, by developing an episodic, rollout version of the IW(k) algorithm, we show that such scores can be obtained in almost real time.

Multiagent Online Planning with Nested Beliefs and Dialogue

AAAI Conferences

The problem of planning with partial observability in the presence of a single agent has been addressed as a contingent or POMDP problem. Since the task is computationally hard, on-line approaches have also been developed that just compute the action to do next rather than full policies. In this work, we address a similar problem but in a multiagent setting where agents share a common goal and plan with beliefs which are about the world and the possibly nested beliefs of other agents. For this, we extend the belief tracking formulation of Kominis and Geffner to the on-line setting where plans are supposed to work for the true hidden state as revealed by the observations, and develop an alternative translation into classical planning that is used within a plan-execute-observe-and-replan cycle. Planning is done from the perspective of the agents, and there is a single planning agent in each replanning episode that can change across episodes. We present empirical results and show that interesting agent dialogues arise in this setting where agents collaborate by requesting or volunteering information in a goal-directed manner.

A Polynomial Planning Algorithm That Beats LAMA and FF

AAAI Conferences

It has been shown recently that heuristic and width-based search can be combined to produce planning algorithms with a performance that goes beyond the state-of-the-art. Such algorithms are based on best-first width search (BFWS), a plain best-first search set with evaluations functions combined lexicographically to break ties, some of which express novelty based preferences. In BFWS(f5), for example, the evaluation function f5 weights nodes by a novelty measure, breaking ties by the number of non-achieved goals. BFWS(f5) is a best-first algorithm, and hence, it is complete but not polynomial, and its performance doesn’t match the state of the art. In this work we show, however, that incomplete versions of BFWS(f5) where nodes with novelty greater than k are pruned, are not only polynomial but have an empirical performance that is better than both BFWS(f5) and state-of-the-art planners. This is shown by considering all the international planning competition instances. This is the first time where polynomial algorithms with meaningful bounds are shown to achieve state-of-the-art performance in planning. Practical and theoretical implications of this empirical finding are briefly sketched.

Best-First Width Search: Exploration and Exploitation in Classical Planning

AAAI Conferences

It has been shown recently that the performance of greedy best-first search (GBFS) for computing plans that are not necessarily optimal can be improved by adding forms of exploration when reaching heuristic plateaus: from random walks to local GBFS searches. In this work, we address this problem but using structural exploration methods resulting from the ideas of width-based search. Width-based methodsseek novel states, are not goal oriented, and their power has been shown recently in the Atari and GVG-AI video-games. We show first that width-based exploration in GBFS is more effective than GBFS with local GBFS search (GBFS-LS), and then proceed to formulate a simple and general computational framework where standard goal-oriented search (exploitation) and width-based search (structural exploration) are combined to yield a search scheme, best-first width search, that is better than both and which results in classical planning algorithms that outperform the state-of-the-art planners.

Traps, Invariants, and Dead-Ends

AAAI Conferences

We consider the problem of deriving formulas that capture traps, invariants, and dead-ends in classical planning through polynomial forms of preprocessing. An invariant is a formula that is true in the initial state and in all reachable states. A trap is a conditional invariant: once a state is reached that makes the trap true, all the states that are reachable from it will sat- isfy the trap formula as well. Finally, dead-ends are formulas that are satisfied in states that make the goal unreachable. We introduce a preprocessing algorithm that computes traps in k- DNF form that is exponential in the k parameter, and show how the algorithm can be used to precompute invariants and dead-ends. We report also preliminary tests that illustrate the effectiveness of the preprocessing algorithm for identifying dead-end states, and compare it with the identification that follows from the use of the h1 and h2 heuristics that cannot be preprocessed, and must be computed at run time.

Policies that Generalize: Solving Many Planning Problems with the Same Policy

AAAI Conferences

We establish conditions under which memoryless policies and finite-state controllers that solve one partially observable non-deterministic problem (PONDP) generalize to other problems; namely, problems that have a similar structure and share the same action and observation space. This is relevant to generalized planning where plans that work for many problems are sought, and to transfer learning where knowledge gained in the solution of one problem is to be used on related problems. We use a logical setting where uncertainty is represented by sets of states and the goal is to be achieved with certainty. While this gives us crisp notions of solution policies and generalization, the account also applies to probabilistic  PONDs, i.e., Goal POMDPs.

Classical Planning Algorithms on the Atari Video Games

AAAI Conferences

The Atari 2600 games supported in the Arcade Learning Environment (Bellemare et al. 2013) all feature aknown initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used for selecting actions for two reasons: first, nocompact PDDL-model of the games is given, and more importantly, the action effects and goals are not known a priori. Moreover, in these games there is usually no set of goals to be achieved but rewards to be collected. These features do not preclude the use of classical algorithms like breadth-first search or Dijkstra’s algorithm, but these methods are not effective over large state spaces. We thus turn to a different class of classical planning algorithms introduced recently that perform a structured exploration of the state space; namely, like breadth-first search and Dijkstra’s algorithm they are“blind” and hence do not require prior knowledge of state transitions, costs (rewards) or goals, and yet, like heuristic search algorithms, they have been shown to be effective for solving problems over huge state spaces.The simplest such algorithm, called Iterated Width or IW, consists of a sequence of calls IW(1), IW(2), . . . ,IW(k) where IW(i) is a breadth-first search in which a state is pruned when it is not the first state in the search to make true some subset of i atoms. The empirical results over 54 games suggest that the performance of IW with the k parameter fixed to 1, i.e., IW(1), is at the level of the state of the art represented by UCT. A simple best-first variation of IW that combines exploration and exploitation proves to be very competitive as well.

Soft Goals Can Be Compiled Away Artificial Intelligence

Soft goals extend the classical model of planning with a simple model of preferences. The best plans are then not the ones with least cost but the ones with maximum utility, where the utility of a plan is the sum of the utilities of the soft goals achieved minus the plan cost. Finding plans with high utility appears to involve two linked problems: choosing a subset of soft goals to achieve and finding a low-cost plan to achieve them. New search algorithms and heuristics have been developed for planning with soft goals, and a new track has been introduced in the International Planning Competition (IPC) to test their performance. In this note, we show however that these extensions are not needed: soft goals do not increase the expressive power of the basic model of planning with action costs, as they can easily be compiled away. We apply this compilation to the problems of the net-benefit track of the most recent IPC, and show that optimal and satisficing cost-based planners do better on the compiled problems than optimal and satisficing net-benefit planners on the original problems with explicit soft goals. Furthermore, we show that penalties, or negative preferences expressing conditions to avoid, can also be compiled away using a similar idea.

Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width Artificial Intelligence

Conformant planning is the problem of finding a sequence of actions for achieving a goal in the presence of uncertainty in the initial state or action effects. The problem has been approached as a path-finding problem in belief space where good belief representations and heuristics are critical for scaling up. In this work, a different formulation is introduced for conformant problems with deterministic actions where they are automatically converted into classical ones and solved by an off-the-shelf classical planner. The translation maps literals L and sets of assumptions t about the initial situation, into new literals KL/t that represent that L must be true if t is initially true. We lay out a general translation scheme that is sound and establish the conditions under which the translation is also complete. We show that the complexity of the complete translation is exponential in a parameter of the problem called the conformant width, which for most benchmarks is bounded. The planner based on this translation exhibits good performance in comparison with existing planners, and is the basis for T0, the best performing planner in the Conformant Track of the 2006 International Planning Competition.