Sanner, Scott
Practical Linear Value-approximation Techniques for First-order MDPs
Sanner, Scott, Boutilier, Craig
Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the first-order value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the first-order ALP framework to approximate policy iteration to address performance deficiencies of previous approaches? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on logistics problems from the ICAPS 2004 Probabilistic Planning Competition.
A Survey of the Seventh International Planning Competition
Coles, Amanda (King’s College London) | Coles, Andrew (King’s College London) | Olaya, Angel García (Universidad Carlos III de Madrid) | Jiménez, Sergio (Universidad Carlos III de Madrid) | López, Carlos Linares (Universidad Carlos III de Madrid) | Sanner, Scott (NICTA and Australian National University) | Yoon, Sungwook (Palo Alto Research Center)
In this article we review the 2011 International Planning Competition. We give an overview of the history of the competition, discussing how it has developed since its first edition in 1998. The 2011 competition was run in three main separate tracks: the deterministic (classical) track; the learning track; and the uncertainty track. Each track proposed its own distinct set of new challenges and the participants rose to these admirably, the results of each track showing promising progress in each area. The competition attracted a record number of participants this year, showing its continued and strong position as a major central pillar of the international planning research community.
Symbolic Dynamic Programming for Discrete and Continuous State MDPs
Sanner, Scott, Delgado, Karina Valdivia, de Barros, Leliane Nunes
Many real-world decision-theoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DC-MDPs). While previous work has addressed automated decision-theoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DC-MDPs having hyper-rectangular piecewise linear value functions. In this work, we extend symbolic dynamic programming (SDP) techniques to provide optimal solutions for a vastly expanded class of DCMDPs. To address the inherent combinatorial aspects of SDP, we introduce the XADD - a continuous variable extension of the algebraic decision diagram (ADD) - that maintains compact representations of the exact value function. Empirically, we demonstrate an implementation of SDP with XADDs on various DC-MDPs, showing the first optimal automated solutions to DCMDPs with linear and nonlinear piecewise partitioned value functions and showing the advantages of constraint-based pruning for XADDs.
Gaussian Process Preference Elicitation
Guo, Shengbo, Sanner, Scott, Bonilla, Edwin V.
Bayesian approaches to preference elicitation (PE) are particularly attractive due to their ability to explicitly model uncertainty in users' latent utility functions. However, previous approaches to Bayesian PE have ignored the important problem of generalizing from previous users to an unseen user in order to reduce the elicitation burden on new users. In this paper, we address this deficiency by introducing a Gaussian Process (GP) prior over users' latent utility functions on the joint space of user and item features. We learn the hyper-parameters of this GP on a set of preferences of previous users and use it to aid in the elicitation process for a new user. This approach provides a flexible model of a multi-user utility function, facilitates an efficient value of information (VOI) heuristic query selection strategy, and provides a principled way to incorporate the elicitations of multiple users back into the model. We show the effectiveness of our method in comparison to previous work on a real dataset of user preferences over sushi types.
Symbolic Dynamic Programming for First-order POMDPs
Sanner, Scott (NICTA and ANU) | Kersting, Kristian (Fraunhofer IAIS)
Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (approximately) optimal dynamic programming solutions. Much work in recent years has focused on improving the efficiency of these dynamic programming algorithms by exploiting symmetries and factored or relational representations. In this work, we show that it is also possible to exploit the full expressive power of first-order quantification to achieve state, action, and observation abstraction in a dynamic programming solution to relationally specified POMDPs. Among the advantages of this approach are the ability to maintain compact value function representations, abstract over the space of potentially optimal actions, and automatically derive compact conditional policy trees that minimally partition relational observation spaces according to distinctions that have an impact on policy values. This is the first lifted relational POMDP solution that can optimally accommodate actions with a potentially infinite relational space of observation outcomes.