Geffner, Hector

Compact Policies for Fully-Observable Non-Deterministic Planning as SAT Artificial Intelligence

Fully observable non-deterministic (FOND) planning is becoming increasingly important as an approach for computing proper policies in probabilistic planning, extended temporal plans in LTL planning, and general plans in generalized planning. In this work, we introduce a SAT encoding for FOND planning that is compact and can produce compact strong cyclic policies. Simple variations of the encodings are also introduced for strong planning and for what we call, dual FOND planning, where some non-deterministic actions are assumed to be fair (e.g., probabilistic) and others unfair (e.g., adversarial). The resulting FOND planners are compared empirically with existing planners over existing and new benchmarks. The notion of "probabilistic interesting problems" is also revisited to yield a more comprehensive picture of the strengths and limitations of current FOND planners and the proposed SAT approach.

Model-free, Model-based, and General Intelligence Artificial Intelligence

During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of tackling well-defined but intractable models like SAT, classical planning, Bayesian networks, and POMDPs. The learning approach has achieved considerable success but results in black boxes that do not have the flexibility, transparency, and generality of their model-based counterparts. Model-based approaches, on the other hand, require models and scalable algorithms. Model-free learners and model-based solvers have close parallels with Systems 1 and 2 in current theories of the human mind: the first, a fast, opaque, and inflexible intuitive mind; the second, a slow, transparent, and flexible analytical mind. In this paper, I review developments in AI and draw on these theories to discuss the gap between model-free learners and model-based solvers, a gap that needs to be bridged in order to have intelligent systems that are robust and general.

Features, Projections, and Representation Change for Generalized Planning Artificial Intelligence

Generalized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a mapping from feature or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and objects change across instances. In this work, we extend the standard formulation of generalized planning to such domains. This is achieved by projecting the actions over the features, resulting in a common set of abstract actions which can be tested for soundness and completeness, and which can be used for generating general policies such as "if the gripper is empty, pick the clear block above x and place it on the table" that achieve the goal clear(x) in any Blocksworld instance. In this policy, "pick the clear block above x" is an abstract action that may represent the action Unstack(a, b) in one situation and the action Unstack(b, c) in another. Transformations are also introduced for computing such policies by means of fully observable non-deterministic (FOND) planners. The value of generalized representations for learning general policies is also discussed.

Planning With Pixels in (Almost) Real Time

AAAI Conferences

Recently, width-based planning methods have been shown to yield state-of-the-art results in the Atari 2600 video games. For this, the states were associated with the (RAM) memory states of the simulator. In this work, we consider the same planning problem but using the screen instead. By using the same visual inputs, the planning results can be compared with those of humans and learning methods. We show that the planning approach, out of the box and without training, results in scores that compare well with those obtained by humans and learning methods, and moreover, by developing an episodic, rollout version of the IW(k) algorithm, we show that such scores can be obtained in almost real time.

Multiagent Online Planning with Nested Beliefs and Dialogue

AAAI Conferences

The problem of planning with partial observability in the presence of a single agent has been addressed as a contingent or POMDP problem. Since the task is computationally hard, on-line approaches have also been developed that just compute the action to do next rather than full policies. In this work, we address a similar problem but in a multiagent setting where agents share a common goal and plan with beliefs which are about the world and the possibly nested beliefs of other agents. For this, we extend the belief tracking formulation of Kominis and Geffner to the on-line setting where plans are supposed to work for the true hidden state as revealed by the observations, and develop an alternative translation into classical planning that is used within a plan-execute-observe-and-replan cycle. Planning is done from the perspective of the agents, and there is a single planning agent in each replanning episode that can change across episodes. We present empirical results and show that interesting agent dialogues arise in this setting where agents collaborate by requesting or volunteering information in a goal-directed manner.

A Polynomial Planning Algorithm That Beats LAMA and FF

AAAI Conferences

It has been shown recently that heuristic and width-based search can be combined to produce planning algorithms with a performance that goes beyond the state-of-the-art. Such algorithms are based on best-first width search (BFWS), a plain best-first search set with evaluations functions combined lexicographically to break ties, some of which express novelty based preferences. In BFWS(f5), for example, the evaluation function f5 weights nodes by a novelty measure, breaking ties by the number of non-achieved goals. BFWS(f5) is a best-first algorithm, and hence, it is complete but not polynomial, and its performance doesn’t match the state of the art. In this work we show, however, that incomplete versions of BFWS(f5) where nodes with novelty greater than k are pruned, are not only polynomial but have an empirical performance that is better than both BFWS(f5) and state-of-the-art planners. This is shown by considering all the international planning competition instances. This is the first time where polynomial algorithms with meaningful bounds are shown to achieve state-of-the-art performance in planning. Practical and theoretical implications of this empirical finding are briefly sketched.

Best-First Width Search: Exploration and Exploitation in Classical Planning

AAAI Conferences

It has been shown recently that the performance of greedy best-first search (GBFS) for computing plans that are not necessarily optimal can be improved by adding forms of exploration when reaching heuristic plateaus: from random walks to local GBFS searches. In this work, we address this problem but using structural exploration methods resulting from the ideas of width-based search. Width-based methodsseek novel states, are not goal oriented, and their power has been shown recently in the Atari and GVG-AI video-games. We show first that width-based exploration in GBFS is more effective than GBFS with local GBFS search (GBFS-LS), and then proceed to formulate a simple and general computational framework where standard goal-oriented search (exploitation) and width-based search (structural exploration) are combined to yield a search scheme, best-first width search, that is better than both and which results in classical planning algorithms that outperform the state-of-the-art planners.

Traps, Invariants, and Dead-Ends

AAAI Conferences

We consider the problem of deriving formulas that capture traps, invariants, and dead-ends in classical planning through polynomial forms of preprocessing. An invariant is a formula that is true in the initial state and in all reachable states. A trap is a conditional invariant: once a state is reached that makes the trap true, all the states that are reachable from it will sat- isfy the trap formula as well. Finally, dead-ends are formulas that are satisfied in states that make the goal unreachable. We introduce a preprocessing algorithm that computes traps in k- DNF form that is exponential in the k parameter, and show how the algorithm can be used to precompute invariants and dead-ends. We report also preliminary tests that illustrate the effectiveness of the preprocessing algorithm for identifying dead-end states, and compare it with the identification that follows from the use of the h1 and h2 heuristics that cannot be preprocessed, and must be computed at run time.

Policies that Generalize: Solving Many Planning Problems with the Same Policy

AAAI Conferences

We establish conditions under which memoryless policies and finite-state controllers that solve one partially observable non-deterministic problem (PONDP) generalize to other problems; namely, problems that have a similar structure and share the same action and observation space. This is relevant to generalized planning where plans that work for many problems are sought, and to transfer learning where knowledge gained in the solution of one problem is to be used on related problems. We use a logical setting where uncertainty is represented by sets of states and the goal is to be achieved with certainty. While this gives us crisp notions of solution policies and generalization, the account also applies to probabilistic  PONDs, i.e., Goal POMDPs.

Classical Planning Algorithms on the Atari Video Games

AAAI Conferences

The Atari 2600 games supported in the Arcade Learning Environment (Bellemare et al. 2013) all feature aknown initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used for selecting actions for two reasons: first, nocompact PDDL-model of the games is given, and more importantly, the action effects and goals are not known a priori. Moreover, in these games there is usually no set of goals to be achieved but rewards to be collected. These features do not preclude the use of classical algorithms like breadth-first search or Dijkstra’s algorithm, but these methods are not effective over large state spaces. We thus turn to a different class of classical planning algorithms introduced recently that perform a structured exploration of the state space; namely, like breadth-first search and Dijkstra’s algorithm they are“blind” and hence do not require prior knowledge of state transitions, costs (rewards) or goals, and yet, like heuristic search algorithms, they have been shown to be effective for solving problems over huge state spaces.The simplest such algorithm, called Iterated Width or IW, consists of a sequence of calls IW(1), IW(2), . . . ,IW(k) where IW(i) is a breadth-first search in which a state is pruned when it is not the first state in the search to make true some subset of i atoms. The empirical results over 54 games suggest that the performance of IW with the k parameter fixed to 1, i.e., IW(1), is at the level of the state of the art represented by UCT. A simple best-first variation of IW that combines exploration and exploitation proves to be very competitive as well.