AITopics | Planning & Scheduling

Collaborating Authors

Planning & Scheduling

"Planning is the process of generating (possibly partial) representations of future behavior prior to the use of such plans to constrain or control that behavior. The outcome is usually a set of actions, with temporal and other constraints on them, for execution by some agent or agents. As a core aspect of human intelligence, planning has been studied since the earliest days of AI and cognitive science. Planning research has led to many useful tools for real-world applications, and has yielded significant insights into the organization of behavior and the nature of reasoning about actions."
– Planning entry by Austin Tate in the MIT Encyclopedia of Cognitive Science.

News Overviews Instructional Materials AI-Alerts Classics

Monte-Carlo Planning in Large POMDPs

Silver, David, Veness, Joel

Neural Information Processing SystemsFeb-15-2020, 03:26:51 GMT

This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent's belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, Monte-Carlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions.

algorithm, monte-carlo planning, pomdp, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.98)

Add feedback

PDDLGym: Gym Environments from PDDL Problems

Silver, Tom, Chitnis, Rohan

arXiv.org Artificial IntelligenceFeb-15-2020

Observations and actions in PDDLGym are relational, making the framework particularly well-suited for research in relational reinforcement learning and relational sequential decision-making. PDDLGym is also useful as a generic framework for rapidly building numerous, diverse benchmarks from a concise and familiar specification language. We discuss design decisions and implementation details, and also illustrate empirical variations between the 15 built-in environments in terms of planning and model-learning difficulty. We hope that PDDLGym will facilitate bridge-building between the reinforcement learning community (from which Gym emerged) and the AI planning community (which produced PDDL). We look forward to gathering feedback from all those interested and expanding the set of available environments and features accordingly.

operator, pddlgym, problem file, (14 more...)

arXiv.org Artificial Intelligence

2002.06432

Country:

Asia > Vietnam > Hanoi > Hanoi (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Colorado (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Action-Model Based Multi-agent Plan Recognition

Zhuo, Hankz H., Yang, Qiang, Kambhampati, Subbarao

Neural Information Processing SystemsFeb-14-2020, 21:42:36 GMT

Multi-Agent Plan Recognition (MAPR) aims to recognize dynamic team structures and team behaviors from the observed team traces (activity sequences) of a set of intelligent agents. Previous MAPR approaches required a library of team activity sequences (team plans) be given as input. However, collecting a library of team plans to ensure adequate coverage is often difficult and costly. In this paper, we relax this constraint, so that team plans are not required to be provided beforehand. We assume instead that a set of action models are available.

library, multi-agent plan recognition, team plan, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling > Plan Recognition (0.79)

Add feedback

Monte-Carlo Tree Search for Constrained POMDPs

Lee, Jongmin, Kim, Geon-hyeong, Poupart, Pascal, Kim, Kee-Eung

Neural Information Processing SystemsFeb-14-2020, 19:58:06 GMT

Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment.

constrained pomdp, monte-carlo tree search, pomdp, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Variational Planning for Graph-based MDPs

Cheng, Qiang, Liu, Qiang, Chen, Feng, Ihler, Alexander T.

Neural Information Processing SystemsFeb-14-2020, 19:12:40 GMT

Markov Decision Processes (MDPs) are extremely useful for modeling and solving sequential decision making problems. Graph-based MDPs provide a compact representation for MDPs with large numbers of random variables. However, the complexity of exactly solving a graph-based MDP usually grows exponentially in the number of variables, which limits their application. We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. In particular, by exploiting the graph structure of graph-based MDPs, we propose a factored variational value iteration algorithm in which the value function is first approximated by the multiplication of local-scope value functions, then solved by minimizing a Kullback-Leibler (KL) divergence.

algorithm, graph-based mdp, variational planning, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.65)

Add feedback

M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

Shen, Yelong, Chen, Jianshu, Huang, Po-Sen, Guo, Yuqing, Gao, Jianfeng

Neural Information Processing SystemsFeb-14-2020, 18:57:02 GMT

Learning to walk over a graph towards a target node for a given query and a source node is an important problem in applications such as knowledge base completion (KBC). It can be formulated as a reinforcement learning (RL) problem with a known state transition model. To overcome the challenge of sparse rewards, we develop a graph-walking agent called M-Walk, which consists of a deep recurrent neural network (RNN) and Monte Carlo Tree Search (MCTS). The RNN encodes the state (i.e., history of the walked path) and maps it separately to a policy and Q-values. In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards.

learning, m-walk, monte carlo tree search, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Convergence of Monte Carlo Tree Search in Simultaneous Move Games

Lisy, Viliam, Kovarik, Vojta, Lanctot, Marc, Bosansky, Branislav

Neural Information Processing SystemsFeb-14-2020, 17:58:35 GMT

In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves. We present a general template of MCTS algorithms for these games, which can be instantiated by various selection methods. We formally prove that if a selection method is $\epsilon$-Hannan consistent in a matrix game and satisfies additional requirements on exploration, then the MCTS algorithm eventually converges to an approximate Nash equilibrium (NE) of the extensive-form game. We empirically evaluate this claim using regret matching and Exp3 as the selection methods on randomly generated and worst case games. We confirm the formal result and show that additional MCTS variants also converge to approximate NE on the evaluated games.

monte carlo tree search, selection method, simultaneous move game, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search

Bai, Aijun, Wu, Feng, Chen, Xiaoping

Neural Information Processing SystemsFeb-14-2020, 17:28:15 GMT

Monte-Carlo tree search is drawing great interest in the domain of planning under uncertainty, particularly when little or no domain knowledge is available. One of the central problems is the trade-off between exploration and exploitation. In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma. The proposed Dirichlet-NormalGamma MCTS (DNG-MCTS) algorithm represents the uncertainty of the accumulated reward for actions in the MCTS search tree as a mixture of Normal distributions and inferences on it in Bayesian settings by choosing conjugate priors in the form of combinations of Dirichlet and NormalGamma distributions. Thompson sampling is used to select the best action at each decision node.

bayesian mixture modelling and inference, monte-carlo tree search, thompson sampling, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning

Grill, Jean-Bastien, Valko, Michal, Munos, Remi

Neural Information Processing SystemsFeb-14-2020, 16:26:45 GMT

We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning. Our objective is to return a good estimate of the optimal value function at any state while minimizing the number of calls to the generative model, i.e. the sample complexity. We propose a new algorithm, TrailBlazer, able to handle MDPs with a finite or an infinite number of transitions from state-action to next states. TrailBlazer is an adaptive algorithm that exploits possible structures of the MDP by exploring only a subset of states reachable by following near-optimal policies. We provide bounds on its sample complexity that depend on a measure of the quantity of near-optimal states.

blazing, sample-efficient monte-carlo planning, trailblazer, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Attractor Network Dynamics Enable Preplay and Rapid Path Planning in Maze–like Environments

Corneil, Dane S., Gerstner, Wulfram

Neural Information Processing SystemsFeb-14-2020, 09:56:46 GMT

Rodents navigating in a well-known environment can rapidly learn and revisit observed reward locations, often after a single trial. While the mechanism for rapid path planning is unknown, the CA3 region in the hippocampus plays an important role, and emerging evidence suggests that place cell activity during hippocampal preplay periods may trace out future goal-directed trajectories. Here, we show how a particular mapping of space allows for the immediate generation of trajectories between arbitrary start and goal locations in an environment, based only on the mapped representation of the goal. We show that this representation can be implemented in a neural attractor network model, resulting in bump--like activity profiles resembling those of the CA3 region of hippocampus. Neurons tend to locally excite neurons with similar place field centers, while inhibiting other neurons with distant place field centers, such that stable bumps of activity can form at arbitrary locations in the environment.

attractor network dynamic enable preplay, goal location, preplay and rapid path planning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)

Add feedback