Goto

Collaborating Authors

 Planning & Scheduling


Action Selection for MDPs: Anytime AO* Versus UCT

AAAI Conferences

One of the natural approaches for selecting actions in very From this perspective, an algorithm like RTDP fails on two large state spaces is by performing a limited amount of grounds: first, RTDP does not appear to make best use of lookahead. In the contexts of discounted MDPs, Kearns, short time windows in large state spaces; second, and more Mansour, and Ng have shown that near to optimal actions importantly, RTDP can use admissible heuristics but not informed can be selected by considering a sampled lookahead tree that base policies. On the other hand, algorithms like Policy is sufficiently sparse, whose size depends on the discount Iteration (Howard 1971), deliver all of these features except factor and the suboptimality bound but not on the number of one: they are exhaustive, and thus even to get started, problem states (Kearns, Mansour, and Ng 1999). The UCT they need vectors with the size of the state space. At the algorithm (Kocsis and Szepesvári 2006) is a version of this same time, while there are non-exhaustive versions of (asynchronous) form of Monte Carlo planning, where the lookahead trees Value Iteration such as RTDP, there are no similar are not grown depth-first but'best-first', following a selection'focused' versions of Policy Iteration ensuring anytime optimality.


The Complexity of Planning Revisited — A Parameterized Analysis

AAAI Conferences

The early classifications of the computational complexity of planning under various restrictions in STRIPS (Bylander) and SAS+ (Bäckström and Nebel) have influenced following research in planning in many ways. We go back and reanalyse their subclasses, but this time using the more modern tool of parameterized complexity analysis. This provides new results that together with the old results give a more detailed picture of the complexity landscape. We demonstrate separation results not possible with standard complexity theory, which contributes to explaining why certain cases of planning have seemed simpler in practice than theory has predicted. In particular, we show that certain restrictions of practical interest are tractable in the parameterized sense of the term, and that a simple heuristic is sufficient to make a well-known partial-order planner exploit this fact.


Generalized Monte-Carlo Tree Search Extensions for General Game Playing

AAAI Conferences

General Game Playing (GGP) agents must be capable of playing a wide variety of games skillfully. Monte-Carlo Tree Search (MCTS) has proven an effective reasoning mechanism for this challenge, as is reflected by its popularity among designers of GGP agents. Providing GGP agents with the knowledge relevant to the game at hand in real time is, however, a challenging task. In this paper we propose two enhancements for MCTS in the context of GGP, aimed at improving the effectiveness of the simulations in real time based on in-game statistical feedback. The first extension allows early termination of lengthy and uninformative simulations while the second improves the action-selection strategy when both explored and unexplored actions are available. The methods are empirically evaluated in a state-of-the-art GGP agent and shown to yield an overall significant improvement in playing strength.


Learning Qualitative Models by Demonstration

AAAI Conferences

Creating software agents that learn interactively requires the ability to learn from a small number of trials, extracting general, flexible knowledge that can drive behavior from observation and interaction. We claim that qualitative models provide a useful intermediate level of causal representation for dynamic domains, including the formulation of strategies and tactics. We argue that qualitative models are quickly learnable, and enable model-based reasoning techniques to be used to recognize, operationalize, and construct more strategic knowledge. This paper describes an approach to incrementally learning qualitative influences by demonstration in the context of a strategy game. We show how the learned model can help a system play by enabling it to explain which actions could contribute to maximizing a quantitative goal. We also show how reasoning about the model allows it to reformulate a learning problem to address delayed effects and credit assignment, such that it can improve its performance on more strategic tasks such as city placement.


Towards Automated Choreographing of Web Services Using Planning

AAAI Conferences

For Web service composition, choreography has recently received great attention and demonstrated a few key advantages over orchestration such as distributed control, fairness, data efficiency, and scalability. Automated design of choreography plans, especially distributed plans for multiple roles, is more complex and has not been studied before. Existing work requires manual generation assisted by model checking. In this paper, we propose a novel planning-based approach that can automatically convert a given composition task to a distributed choreography specification. Although planning has been used for orchestration, it is difficult to use planning for choreography, as it involves decentralized control, concurrent workflows, and contingency. We propose a few novel techniques, including compilation of contingencies, dependency graph analysis, and communication control, to handle these characteristics using planning. We theoretically show the correctness of this approach and empirically evaluate its practicability.


Exploiting First-Order Regression in Inductive Policy Selection

arXiv.org Artificial Intelligence

We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using first-order decision theoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical first-order regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver's attention on concepts that are specifically relevant to the optimal value function for the domain considered.


Narrative Planning: Compilations to Classical Planning

Journal of Artificial Intelligence Research

A model of story generation recently proposed by Riedl and Young casts it as planning, with the additional condition that story characters behave intentionally. This means that characters have perceivable motivation for the actions they take. I show that this condition can be compiled away (in more ways than one) to produce a classical planning problem that can be solved by an off-the-shelf classical planner, more efficiently than by Riedl and Young's specialised planner.


Plan-based Policies for Efficient Multiple Battery Load Management

Journal of Artificial Intelligence Research

Efficient use of multiple batteries is a practical problem with wide and growing application. The problem can be cast as a planning problem under uncertainty. We describe the approach we have adopted to modelling and solving this problem, seen as a Markov Decision Problem, building effective policies for battery switching in the face of stochastic load profiles. Our solution exploits and adapts several existing techniques: planning for deterministic mixed discrete-continuous problems and Monte Carlo sampling for policy learning. The paper describes the development of planning techniques to allow solution of the non-linear continuous dynamic models capturing the battery behaviours. This approach depends on carefully handled discretisation of the temporal dimension. The construction of policies is performed using a classification approach and this idea offers opportunities for wider exploitation in other problems. The approach and its generality are described in the paper. Application of the approach leads to construction of policies that, in simulation, significantly outperform those that are currently in use and the best published solutions to the battery management problem. We achieve solutions that achieve more than 99% efficiency in simulation compared with the theoretical limit and do so with far fewer battery switches than existing policies. Behaviour of physical batteries does not exactly match the simulated models for many reasons, so to confirm that our theoretical results can lead to real measured improvements in performance we also conduct and report experiments using a physical test system. These results demonstrate that we can obtain 5%-15% improvement in lifetimes in the case of a two battery system.


Compositional Planning Using Optimal Option Models

arXiv.org Artificial Intelligence

In this paper we introduce a framework for option model composition. Option models are temporal abstractions that, like macro-operators in classical planning, jump directly from a start state to an end state. Prior work has focused on constructing option models from primitive actions, by intra-option model learning; or on using option models to construct a value function, by inter-option planning. We present a unified view of intra- and inter-option model learning, based on a major generalisation of the Bellman equation. Our fundamental operation is the recursive composition of option models into other option models. This key idea enables compositional planning over many levels of abstraction. We illustrate our framework using a dynamic programming algorithm that simultaneously constructs optimal option models for multiple subgoals, and also searches over those option models to provide rapid progress towards other subgoals.


The third open Answer Set Programming competition

arXiv.org Artificial Intelligence

Answer Set Programming (ASP) is a well-established paradigm of declarative programming in close relationship with other declarative formalisms such as SAT Modulo Theories, Constraint Handling Rules, FO(.), PDDL and many others. Since its first informal editions, ASP systems have been compared in the now well-established ASP Competition. The Third (Open) ASP Competition, as the sequel to the ASP Competitions Series held at the University of Potsdam in Germany (2006-2007) and at the University of Leuven in Belgium in 2009, took place at the University of Calabria (Italy) in the first half of 2011. Participants competed on a pre-selected collection of benchmark problems, taken from a variety of domains as well as real world applications. The Competition ran on two tracks: the Model and Solve (M&S) Track, based on an open problem encoding, and open language, and open to any kind of system based on a declarative specification paradigm; and the System Track, run on the basis of fixed, public problem encodings, written in a standard ASP language. This paper discusses the format of the Competition and the rationale behind it, then reports the results for both tracks. Comparison with the second ASP competition and state-of-the-art solutions for some of the benchmark domains is eventually discussed. To appear in Theory and Practice of Logic Programming (TPLP).