Goto

Collaborating Authors

 Geffner, Hector


Learning Features and Abstract Actions for Computing Generalized Plans

arXiv.org Artificial Intelligence

Generalized planning is concerned with the computation of plans that solve not one but multiple instances of a planning domain. Recently, it has been shown that generalized plans can be expressed as mappings of feature values into actions, and that they can often be computed with fully observable non-deterministic (FOND) planners. The actions in such plans, however, are not the actions in the instances themselves, which are not necessarily common to other instances, but abstract actions that are defined on a set of common features. The formulation assumes that the features and the abstract actions are given. In this work, we address this limitation by showing how to learn them automatically. The resulting account of generalized planning combines learning and planning in a novel way: a learner, based on a Max SAT formulation, yields the features and abstract actions from sampled state transitions, and a FOND planner uses this information, suitably transformed, to produce the general plans. Correctness guarantees are given and experimental results on several domains are reported.


Compact Policies for Fully-Observable Non-Deterministic Planning as SAT

arXiv.org Artificial Intelligence

Fully observable non-deterministic (FOND) planning is becoming increasingly important as an approach for computing proper policies in probabilistic planning, extended temporal plans in LTL planning, and general plans in generalized planning. In this work, we introduce a SAT encoding for FOND planning that is compact and can produce compact strong cyclic policies. Simple variations of the encodings are also introduced for strong planning and for what we call, dual FOND planning, where some non-deterministic actions are assumed to be fair (e.g., probabilistic) and others unfair (e.g., adversarial). The resulting FOND planners are compared empirically with existing planners over existing and new benchmarks. The notion of "probabilistic interesting problems" is also revisited to yield a more comprehensive picture of the strengths and limitations of current FOND planners and the proposed SAT approach.


Model-free, Model-based, and General Intelligence

arXiv.org Artificial Intelligence

During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of tackling well-defined but intractable models like SAT, classical planning, Bayesian networks, and POMDPs. The learning approach has achieved considerable success but results in black boxes that do not have the flexibility, transparency, and generality of their model-based counterparts. Model-based approaches, on the other hand, require models and scalable algorithms. Model-free learners and model-based solvers have close parallels with Systems 1 and 2 in current theories of the human mind: the first, a fast, opaque, and inflexible intuitive mind; the second, a slow, transparent, and flexible analytical mind. In this paper, I review developments in AI and draw on these theories to discuss the gap between model-free learners and model-based solvers, a gap that needs to be bridged in order to have intelligent systems that are robust and general.


Features, Projections, and Representation Change for Generalized Planning

arXiv.org Artificial Intelligence

Generalized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a mapping from feature or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and objects change across instances. In this work, we extend the standard formulation of generalized planning to such domains. This is achieved by projecting the actions over the features, resulting in a common set of abstract actions which can be tested for soundness and completeness, and which can be used for generating general policies such as "if the gripper is empty, pick the clear block above x and place it on the table" that achieve the goal clear(x) in any Blocksworld instance. In this policy, "pick the clear block above x" is an abstract action that may represent the action Unstack(a, b) in one situation and the action Unstack(b, c) in another. Transformations are also introduced for computing such policies by means of fully observable non-deterministic (FOND) planners. The value of generalized representations for learning general policies is also discussed.


Planning With Pixels in (Almost) Real Time

AAAI Conferences

Recently, width-based planning methods have been shown to yield state-of-the-art results in the Atari 2600 video games. For this, the states were associated with the (RAM) memory states of the simulator. In this work, we consider the same planning problem but using the screen instead. By using the same visual inputs, the planning results can be compared with those of humans and learning methods. We show that the planning approach, out of the box and without training, results in scores that compare well with those obtained by humans and learning methods, and moreover, by developing an episodic, rollout version of the IW(k) algorithm, we show that such scores can be obtained in almost real time.


A Polynomial Planning Algorithm That Beats LAMA and FF

AAAI Conferences

It has been shown recently that heuristic and width-based search can be combined to produce planning algorithms with a performance that goes beyond the state-of-the-art. Such algorithms are based on best-first width search (BFWS), a plain best-first search set with evaluations functions combined lexicographically to break ties, some of which express novelty based preferences. In BFWS(f5), for example, the evaluation function f5 weights nodes by a novelty measure, breaking ties by the number of non-achieved goals. BFWS(f5) is a best-first algorithm, and hence, it is complete but not polynomial, and its performance doesn’t match the state of the art. In this work we show, however, that incomplete versions of BFWS(f5) where nodes with novelty greater than k are pruned, are not only polynomial but have an empirical performance that is better than both BFWS(f5) and state-of-the-art planners. This is shown by considering all the international planning competition instances. This is the first time where polynomial algorithms with meaningful bounds are shown to achieve state-of-the-art performance in planning. Practical and theoretical implications of this empirical finding are briefly sketched.


Best-First Width Search: Exploration and Exploitation in Classical Planning

AAAI Conferences

It has been shown recently that the performance of greedy best-first search (GBFS) for computing plans that are not necessarily optimal can be improved by adding forms of exploration when reaching heuristic plateaus: from random walks to local GBFS searches. In this work, we address this problem but using structural exploration methods resulting from the ideas of width-based search. Width-based methodsseek novel states, are not goal oriented, and their power has been shown recently in the Atari and GVG-AI video-games. We show first that width-based exploration in GBFS is more effective than GBFS with local GBFS search (GBFS-LS), and then proceed to formulate a simple and general computational framework where standard goal-oriented search (exploitation) and width-based search (structural exploration) are combined to yield a search scheme, best-first width search, that is better than both and which results in classical planning algorithms that outperform the state-of-the-art planners.


Width-Based Planning for General Video-Game Playing

AAAI Conferences

IW(1) is a simple search algorithm that assumes that states can be characterized in terms of a set of boolean features or atoms. IW(1) consists of a standard breadth-first search with one variation: a newly generated state is pruned if it does not make a new atom true. Thus, while a breadth-first search runs in time that is exponential in the number of atoms, IW(1) runs in linear time. Variations of the algorithm have been shown to yield state-of-the-art results in classical planning and more recently in the Atari video games. In this paper, we use the algorithm for selecting actions in the games of the general video-game AI competition (GVG-AI) which, unlike classical planning problems and the Atari games, are stochastic. We evaluate a variation of the algorithm over 30 games under different time windows using the number of wins as the performance measure. We find that IW(1) does better than the sample MCTS and OLMCTS controllers for all time windows with the performance gap growing with the window size. The exception are the puzzle-like games where all the algorithms do poorly. For such problems, we show that much better results can be obtained with the IW(2) algorithm, which is like IW(1), except that states are pruned in the breadth-first search when they fail to make true a new pair of atoms.


Classical Planning with Simulators: Results on the Atari Video Games

AAAI Conferences

The Atari 2600 games supported in the Arcade Learning Environment [Bellemare et al., 2013] all feature a known initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used off-the-shelf as there is no compact PDDL-model of the games, and action effects and goals are not known a priori. Indeed, there are no explicit goals, and the planner must select actions on line while interacting with a simulator that returns successor states and rewards. None of this precludes the use of blind lookahead algorithms for action selection like breadth-first search or Dijkstra’s yet such methods are not effective over large state spaces. We thus turn to a different class of planning methods introduced recently that have been shown to be effective for solving large planning problems but which do not require prior knowledge of state transitions, costs (rewards) or goals. The empirical results over 54 Atari games show that the simplest such algorithm performs at the level of UCT, the state-of-the-art planning method in this domain, and suggest the potential of width-based methods for planning with simulators when factored, compact action models are not available.


Flexible and Scalable Partially Observable Planning with Linear Translations

AAAI Conferences

The problem of on-line planning in partially observable settings involves two problems: keeping track of beliefs about the environment and selecting actions for achieving goals. While the two problems are computationally intractable in the worst case, significant progress has been achieved in recent years through the use of suitable reductions. In particular, the state-of-the-art CLG planner is based on a translation that maps deterministic partially observable problems into fully observable non-deterministic ones. The translation, which is quadratic in the number of problem fluents and gets rid of the belief tracking problem, is adequate for most benchmarks, and it is in fact complete for problems that have width 1. The more recent K-replanner uses translations that are linear, one for keeping track of beliefs and the other for selecting actions using off-the-shelf classical planners. As a result, the K-replanner scales up better but it is not as general. In this work, we combine the benefits of the two approaches - the scope of the CLG planner and the efficiency of the Kreplanner. The new planner, called LW1, is based on a translation that is linear but complete for width-1 problems. The scope and scalability of the new planner is evaluated experimentally by considering the existing benchmarks and new problems.