A Unified Perspective on Value Backup and Exploration in Monte-Carlo Tree Search Artificial Intelligence

Monte-Carlo Tree Search (MCTS) is a class of methods for solving complex decision-making problems through the synergy of Monte-Carlo planning and Reinforcement Learning (RL). The highly combinatorial nature of the problems commonly addressed by MCTS requires the use of efficient exploration strategies for navigating the planning tree and quickly convergent value backup methods. These crucial problems are particularly evident in recent advances that combine MCTS with deep neural networks for function approximation. In this work, we propose two methods for improving the convergence rate and exploration based on a newly introduced backup operator and entropy regularization. We provide strong theoretical guarantees to bound convergence rate, approximation error, and regret of our methods. Moreover, we introduce a mathematical framework based on the use of the $\alpha$-divergence for backup and exploration in MCTS. We show that this theoretical formulation unifies different approaches, including our newly introduced ones, under the same mathematical framework, allowing to obtain different methods by simply changing the value of $\alpha$. In practice, our unified perspective offers a flexible way to balance between exploration and exploitation by tuning the single $\alpha$ parameter according to the problem at hand. We validate our methods through a rigorous empirical study from basic toy problems to the complex Atari games, and including both MDP and POMDP problems.

Task Modifiers for HTN Planning and Acting Artificial Intelligence

The ability of an agent to change its objectives in response to unexpected events is desirable in dynamic environments. In order to provide this capability to hierarchical task network (HTN) planning, we propose an extension of the paradigm called task modifiers, which are functions that receive a task list and a state and produce a new task list. We focus on a particular type of problems in which planning and execution are interleaved and the ability to handle exogenous events is crucial. To determine the efficacy of this approach, we evaluate the performance of our task modifier implementation in two environments, one of which is a simulation that differs substantially from traditional HTN domains.

Adaptive Information Belief Space Planning Artificial Intelligence

Reasoning about uncertainty is vital in many real-life autonomous systems. However, current state-of-the-art planning algorithms cannot either reason about uncertainty explicitly, or do so with a high computational burden. Here, we focus on making informed decisions efficiently, using reward functions that explicitly deal with uncertainty. We formulate an approximation, namely an abstract observation model, that uses an aggregation scheme to alleviate computational costs. We derive bounds on the expected information-theoretic reward function and, as a consequence, on the value function. We then propose a method to refine aggregation to achieve identical action selection with a fraction of the computational time.

Online Planning in POMDPs with Self-Improving Simulators Artificial Intelligence

How can we plan efficiently in a large and complex environment when the time budget is limited? However, there are three main limitations of this "twophase" Given the original simulator of the environment, paradigm, where a simulator is learned offline and which may be computationally very demanding, we then used as-is for online simulation and planning. First, no propose to learn online an approximate but much planning is possible until the offline learning phase finishes, faster simulator that improves over time. To plan which can take a long time. Second, the separation of learning reliably and efficiently while the approximate simulator and planning raises a question on what data collection policy is learning, we develop a method that adaptively should be used during training to ensure good online prediction decides which simulator to use for every simulation, during planning. We empirically demonstrate that when based on a statistic that measures the accuracy the training data is collected by a uniform random policy, the of the approximate simulator. This allows us to learned influence predictors can perform poorly during online use the approximate simulator to replace the original planning, due to distribution shift. Third, completely replacing simulator for faster simulations when it is accurate the original simulator with the approximate one after enough under the current context, thus trading training implies a risk of poor planning performance in certain off simulation speed and accuracy. Experimental situations, which is hard to detect in advance.

A Survey of Opponent Modeling in Adversarial Domains

Journal of Artificial Intelligence Research

Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines.

Forecasting: theory and practice Machine Learning

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

Visual Learning-based Planning for Continuous High-Dimensional POMDPs Artificial Intelligence

The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle very high-dimensional observations they often encounter in the real world (e.g. image observations in robotic domains). In this work, we propose Visual Tree Search (VTS), a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner. We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train. This new approach outperforms a baseline state-of-the-art on-policy planning algorithm while using significantly less offline training time.

Learning to Explore by Reinforcement over High-Level Options Artificial Intelligence

Autonomous 3D environment exploration is a fundamental task for various applications such as navigation. The goal of exploration is to investigate a new environment and build its occupancy map efficiently. In this paper, we propose a new method which grants an agent two intertwined options of behaviors: "look-around" and "frontier navigation". This is implemented by an option-critic architecture and trained by reinforcement learning algorithms. In each timestep, an agent produces an option and a corresponding action according to the policy. We also take advantage of macro-actions by incorporating classic path-planning techniques to increase training efficiency. We demonstrate the effectiveness of the proposed method on two publicly available 3D environment datasets and the results show our method achieves higher coverage than competing techniques with better efficiency.

Semantic Sensing and Planning for Human-Robot Collaboration in Uncertain Environments Artificial Intelligence

Autonomous robots can benefit greatly from human-provided semantic characterizations of uncertain task environments and states. However, the development of integrated strategies which let robots model, communicate, and act on such soft data remains challenging. Here, a framework is presented for active semantic sensing and planning in human-robot teams which addresses these gaps by formally combining the benefits of online sampling-based POMDP policies, multi-modal semantic interaction, and Bayesian data fusion. This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments by sketching and labeling arbitrary landmarks across the environment. Dynamic updating of the environment while searching for a mobile target allows robotic agents to actively query humans for novel and relevant semantic data, thereby improving beliefs of unknown environments and target states for improved online planning. Target search simulations show significant improvements in time and belief state estimates required for interception versus conventional planning based solely on robotic sensing. Human subject studies demonstrate a average doubling in dynamic target capture rate compared to the lone robot case, employing reasoning over a range of user characteristics and interaction modalities. Video of interaction can be found at

Goal Agnostic Planning using Maximum Likelihood Paths in Hypergraph World Models Artificial Intelligence

In this paper, we present a hypergraph--based machine learning algorithm, a datastructure--driven maintenance method, and a planning algorithm based on a probabilistic application of Dijkstra's algorithm. Together, these form a goal agnostic automated planning engine for an autonomous learning agent which incorporates beneficial properties of both classical Machine Learning and traditional Artificial Intelligence. We prove that the algorithm determines optimal solutions within the problem space, mathematically bound learning performance, and supply a mathematical model analyzing system state progression through time yielding explicit predictions for learning curves, goal achievement rates, and response to abstractions and uncertainty. To validate performance, we exhibit results from applying the agent to three archetypal planning problems, including composite hierarchical domains, and highlight empirical findings which illustrate properties elucidated in the analysis.