Goto

Collaborating Authors

 Planning & Scheduling


DESPOT: Online POMDP Planning with Regularization

Neural Information Processing Systems

POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the โ€œcurse of dimensionalityโ€ and the โ€œcurse of historyโ€. This paper presents an online lookahead search algorithm that alleviates these difficulties by limiting the search to a set of sampled scenarios. The execution of all policies on the sampled scenarios is summarized using a Determinized Sparse Partially Observable Tree (DESPOT), which is a sparsely sampled belief tree. Our algorithm, named Regularized DESPOT (R-DESPOT), searches the DESPOT for a policy that optimally balances the size of the policy and the accuracy on its value estimate obtained through sampling. We give an output-sensitive performance bound for all policies derived from the DESPOT, and show that R-DESPOT works well if a small optimal policy exists. We also give an anytime approximation to R-DESPOT. Experiments show strong results, compared with two of the fastest online POMDP algorithms.


Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search

Neural Information Processing Systems

Monte-Carlo tree search is drawing great interest in the domain of planning under uncertainty, particularly when little or no domain knowledge is available. One of the central problems is the trade-off between exploration and exploitation. In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma. The proposed Dirichlet-NormalGamma MCTS (DNG-MCTS) algorithm represents the uncertainty of the accumulated reward for actions in the MCTS search tree as a mixture of Normal distributions and inferences on it in Bayesian settings by choosing conjugate priors in the form of combinations of Dirichlet and NormalGamma distributions. Thompson sampling is used to select the best action at each decision node. Experimental results show that our proposed algorithm has achieved the state-of-the-art comparing with popular UCT algorithm in the context of online planning for general Markov decision processes.


Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

arXiv.org Artificial Intelligence

Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems -- because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.


Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search

Journal of Artificial Intelligence Research

Bayesian planning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, planning optimally in the face of uncertainty is notoriously taxing, since the search space is enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach avoids expensive applications of Bayes rule within the search tree by sampling models from current beliefs, and furthermore performs this sampling in a lazy manner. This enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems. As we show, our approach can even work in problems with an infinite state space that lie qualitatively out of reach of almost all previous work in Bayesian exploration.


The Complexity of Optimal Monotonic Planning: The Bad, The Good, and The Causal Graph

Journal of Artificial Intelligence Research

For almost two decades, monotonic, or ``delete free,'' relaxation has been one of the key auxiliary tools in the practice of domain-independent deterministic planning. In the particular contexts of both satisficing and optimal planning, it underlies most state-of-the-art heuristic functions. While satisficing planning for monotonic tasks is polynomial-time, optimal planning for monotonic tasks is NP-equivalent. Here we establish both negative and positive results on the complexity of some wide fragments of optimal monotonic planning, with the fragments being defined around the causal graph topology. Our results shed some light on the link between the complexity of general optimal planning and the complexity of optimal planning for the respective monotonic relaxations.


Generating Believable Stories in Large Domains

AAAI Conferences

Planning-based techniques are a very powerful tool for automated story generation. However, as the number of possible actions increases, traditional planning techniques suffer from a combinatorial explosion due to large branching factors. In this work, we apply Monte Carlo Tree Search (MCTS) techniques to generate stories in domains with large numbers of possible actions (100+). Our approach employs a Bayesian story evaluation method to guide the planning towards believable stories that reach a user defined goal. We generate stories in a novel domain with different type of story goals. Our approach shows an order of magnitude improvement in performance over traditional search techniques.


Automated Generation of Diverse NPC-Controlling FSMs Using Nondeterministic Planning Techniques

AAAI Conferences

We study the problem of generating a set of Finite State Machines (FSMs) modeling the behavior of multiple, distinct NPCs. We observe that nondeterministic planning techniques can be used to generate FSMs by following conventions typically used when manually creating FSMs modeling NPC behavior. We implement our ideas in DivNDP, the first algorithm for automated diverse FSM generation.


Integrating Monte Carlo Tree Search with Knowledge-Based Methods to Create Engaging Play in a Commercial Mobile Game

AAAI Conferences

Monte Carlo Tree Search (MCTS) has produced many recent breakthroughs in game AI research, particularly in computer Go. In this paper we consider how MCTS can be applied to create engaging AI for a popular commercial mobile phone game: Spades by AI Factory, which has been downloaded more than 2.5 million times. In particular, we show how MCTS can be integrated with knowledge-based methods to create an interesting, fun and strong player which makes far fewer plays that could be perceived by human observers as blunders than MCTS without the injection of knowledge. These blunders are particularly noticeable for Spades, where a human player must co-operate with an AI partner. MCTS gives objectively stronger play than the knowledge-based approach used in previous versions of the game and offers the flexibility to customise behaviour whilst maintaining a reusable core, with a reduced development cycle compared to purely knowledge-based techniques.


OnDroad Planner: Building Tourist Plans Using Traveling Social Network Information

AAAI Conferences

One of the key challenges in automated planning is to define the sources of information that will feed the initial state and goals of each planning task. In many domains, the information comes from company's databases. In other applications, the information is harder to obtain and it is usually partial. In this paper, we will describe an application on travel planning, where the initial state and goals will be obtained by crowdsourcing. Travel planning requires the use of plenty Internet-based resources; some of them are related to human generated opinions on all kinds of matters (e.g. hotels, places to visit, restaurants, ...). We present the OnDroad planner, a system that creates personalized tourist plans using the human generated information gathered from the minube traveling social network. OnDroad proposes an initial tourist guide according to the recommendation of the users profiles and their contacts. In addition, this guide can be continuously updated with newly generated data.


Herding the Crowd: Automated Planning for Crowdsourced Planning

AAAI Conferences

An important application of human computation is crowdsourced planning and scheduling. In this paper, we present an architecture for an automated system that can significantly improve the effectiveness of the crowd in collaborating and coming up with effective plans by herding it. We define two main problems that have to be solved when designing such automated crowd-herding systems: interpretation, and steering; and discuss how automated planning techniques can be used to solve these problems.