University of Strathclyde
Fast-Tracking Stationary MOMDPs for Adaptive Management Problems
Péron, Martin (Queensland University of Technology, CSIRO) | Becker, Kai Helge (University of Strathclyde) | Bartlett, Peter (University of California, Berkeley) | Chadès, Iadine (Commonwealth Scientific and Industrial Research Organisation)
Adaptive management is applied in conservation and natural resource management, and consists of making sequential decisions when the transition matrix is uncertain. Informally described as ’learning by doing’, this approach aims to trade off between decisions that help achieve the objective and decisions that will yield a better knowledge of the true transition matrix. When the true transition matrix is assumed to be an element of a finite set of possible matrices, solving a mixed observability Markov decision process (MOMDP) leads to an optimal trade-off but is very computationally demanding. Under the assumption (common in adaptive management) that the true transition matrix is stationary, we propose a polynomial-time algorithm to find a lower bound of the value function. In the corners of the domain of the value function (belief space), this lower bound is provably equal to the optimal value function. We also show that under further assumptions, it is a linear approximation of the optimal value function in a neighborhood around the corners. We evaluate the benefits of our approach by using it to initialize the solvers MO-SARSOP and Perseus on a novel computational sustainability problem and a recent adaptive management data challenge. Our approach leads to an improved initial value function and translates into significant computational gains for both solvers.
Optimising Plans Using Genetic Programming
Westerberg, C. Henrik (University of Edinburgh) | Levine, John (University of Strathclyde)
Finding the shortest plan for a given planning problem is extremely hard. We present a domain independent approach for plan optimisation based on Genetic Programming. The algorithm is seeded with correct plans created by hand-encoded heuristic policy sets. The plans are very unlikely to be optimal but are created quickly. The suboptimal plans are then evolved using a generational algorithm towards the optimal plan. We present initial results from Blocks World and found that GP method almost always improved sub-optimal plans, often drastically.
Exploiting Path Refinement Abstraction in Domain Transition Graphs
Gregory, Peter (University of Strathclyde) | Long, Derek (University of Strathclyde) | McNulty, Craig (University of Strathclyde) | Murphy, Susan M. (University of Strathclyde)
Partial Refinement A-Star (PRA* is an abstraction technique, based on clustering nearby nodes in graphs, useful in large path-planning problems. Abstracting the underlying graph yields a simpler problem whose solution can be used, by refinement, as a guide to a solution to the original problem. A fruitful way to view domain independent planning problems is as a collection of multi-valued variables that must perform synchronised transitions through graphs of possible values, where the edges are defined by the domain actions. Planning involves finding efficient paths through Domain Transition Graphs (DTGs). In problems where these graphs are large, planning can be prohibitively expensive. In this paper we explore two ways to exploit PRA* in DTGs.
Generalised Domain Model Acquisition from Action Traces
Cresswell, Stephen (The Stationery Office) | Gregory, Peter (University of Strathclyde)
One approach to the problem of formulating domain models for planning is to learn the models from example action sequences. The LOCM system demonstrated the feasibility of learning domain models from example action sequences only, with no observation of states before, during or after the plans. LOCM uses an object-centred representation, in which each object is represented by a single parameterised state machine. This makes it powerful for learning domains which fit within that representation, but there are some well-known domains which do not. This paper introduces LOCM2, a novel algorithm in which the domain representation of LOCM is generalised to allow multiple parameterised state machines to represent a single object. This extends the coverage of domains for which an adequate domain model can be learned. The LOCM2 algorithm is described and evaluated by testing domain learning from example plans from published results of past International Planning Competitions.
LPRPG-P: Relaxed Plan Heuristics for Planning with Preferences
Coles, Amanda (University of Strathclyde) | Coles, Andrew (University of Strathclyde)
In this paper we present a planner, LPRPG-P, capable of reasoning with the non-temporal subset of PDDL 3 preferences. Our focus is on computation of relaxed plan based heuristics that effectively guide a planner towards good solutions satisfying preferences. We build on the planner LPRPG, a hybrid relaxed planning graph (RPG)--linear programming (LP) approach. We make extensions to the RPG to reason with propositional preferences, and to the LP to reason with numeric preferences. LPRPG-P is the first planner with direct guidance for numeric preference satisfaction, exploiting the strong numeric reasoning of the LP. We introduce an anytime search approach for use with our new heuristic, and present results showing that LPRPG-P extends the state of the art in domain-independent planning with preferences.
Extending the Use of Inference in Temporal Planning as Forwards Search
Coles, Amanda Jane (University of Strathclyde) | Coles, Andrew Ian (University of Strathclyde) | Fox, Maria (University of Strathclyde) | Long, Derek (University of Strathclyde)
PDDL 2.1 supports modelling of complex temporal planning domains in which solutions must exploit concurrency. Few existing temporal planners can solve problems that require concurrency and those that do typically pay a performance price to deploy reasoning machinery that is not always required. In this paper we show how to improve the performance of forward-search planners that attempt to solve the full temporal planning problem, both by narrowing the use of the concurrency machinery to situations that demand it and also by improving the power of inference to prune redundant branches of the search space for common patterns of interaction in temporal domains that do require concurrency. Results illustrate the effectiveness of our ideas in improving the efficiency of a temporal planner that can solve problems with required concurrency, both in domains that exploit this ability and in those that do not.
Lifting the Limitations in a Rule-based Policy Language
Lindsay, Alan (University of Strathclyde) | Fox, Maria (University of Strathclyde) | Long, Derek (University of Strathclyde)
The predicates that are used to encode a planning domain in PDDL often do not include concepts that are important for effectively reasoning about problems in the domain. In particular, the effectiveness of rule-based policies in a domain depend on the concepts that can be expressed in the language used to capture those policies. In this work we investigate complimenting planning domain descriptions with abstract concepts and methods for making distinctions between similar objects. We present an architecture that allows a rule-based policy to reason with these additional concepts, using them to reason over structures that the rules would not be able to reason over without support. We demonstrate that this is sufficient to allow a rule-based policy to provide control in benchmark domains with interesting structures and we argue that our architecture could allow control knowledge learners to learn policies that provide control in these domains.
The Seventeenth International Conference on Automated Planning and Scheduling (ICAPS-07)
Boddy, Mark (Adventium Labs) | Fox, Maria (University of Strathclyde) | Thiébaux, Sylvie (Australian National University)
The Seventeenth International Conference on Automated Planning and Scheduling (ICAPS-07) was held in Providence, Rhode Island in September 2007. It covered the latest theoretical and practical advances in planning and scheduling. The conference was co-located with the Thirteenth International Conference on Principles and Practice of Constraint Programming (CP-07). ICAPS-07 also hosted the second edition of the International Competition on Knowledge Engineering for Planning and Scheduling.
The Seventeenth International Conference on Automated Planning and Scheduling (ICAPS-07)
Boddy, Mark (Adventium Labs) | Fox, Maria (University of Strathclyde) | Thiébaux, Sylvie (Australian National University)
The Seventeenth International Conference on Automated Planning and Scheduling (ICAPS-07) was held in Providence, Rhode Island in September 2007. It covered the latest theoretical and practical advances in planning and scheduling. The conference was co-located with the Thirteenth International Conference on Principles and Practice of Constraint Programming (CP-07). The program consisted of tutorials, workshops, system demonstrations, a doctoral consortium, and three days of technical presentations mostly in parallel sessions. ICAPS-07 also hosted the second edition of the International Competition on Knowledge Engineering for Planning and Scheduling. This report describes the conference in more detail.