Kartal, Bilal
On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman
Gao, Chao, Kartal, Bilal, Hernandez-Leal, Pablo, Taylor, Matthew E.
How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for RL --- past work has shown that model-free RL algorithms fail to achieve significant learning without artificially reducing the environment's complexity. In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman. While model-free random exploration is typically futile, we develop a model-based automatic reasoning module that can be used for safer exploration by pruning actions that will surely lead the agent to death. We empirically demonstrate that this module can significantly improve learning.
Action Guidance with MCTS for Deep Reinforcement Learning
Kartal, Bilal, Hernandez-Leal, Pablo, Taylor, Matthew E.
Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency. In this paper, we focus on how to use action guidance by means of a non-expert demonstrator to improve sample efficiency in a domain with sparse, delayed, and possibly deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods. Compared to a vanilla deep RL algorithm, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game. Introduction Deep reinforcement learning (DRL) has enabled better scalability and generalization for challenging domains (Arulku-maran et al. 2017; Li 2017; Hernandez-Leal, Kartal, and Taylor 2018) such as Atari games (Mnih et al. 2015), Go (Silver et al. 2016) and multiagent games (e.g., Starcraft II and DOT A 2) (OpenAI 2018). However, one of the current biggest challenges for DRL is sample efficiency (Y u 2018). On the one hand, once a DRL agent is trained, it can be deployed to act in real-time by only performing an inference through the trained model. On the other hand, planning methods such as Monte Carlo tree search (MCTS) (Browne et al. 2012) do not have a training phase, but they perform computationally costly simulation based rollouts (assuming access to a simulator) to find the best action to take. There are several ways to get the best of both DRL and search methods.
Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning
Kartal, Bilal, Hernandez-Leal, Pablo, Taylor, Matthew E.
Deep reinforcement learning has achieved great successes in recent years, but there are still open challenges, such as convergence to locally optimal policies and sample inefficiency. In this paper, we contribute a novel self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks. The intuition is to help representation learning by letting the agent predict how close it is to a terminal state, while learning its control policy. Although TP could be integrated with multiple algorithms, this paper focuses on Asynchronous Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our extensive evaluation includes: a set of Atari games, the BipedalWalker domain, and a mini version of the recently proposed multi-agent Pommerman game. Our results on Atari games and the BipedalWalker domain suggest that A3C-TP outperforms standard A3C in most of the tested domains and in others it has similar performance. In Pommerman, our proposed method provides significant improvement both in learning efficiency and converging to better policies against different opponents.
Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition
Gao, Chao, Hernandez-Leal, Pablo, Kartal, Bilal, Taylor, Matthew E.
The Pommerman Team Environment is a recently proposed benchmark which involves a multi-agent domain with challenges such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards. The inaugural Pommerman Team Competition held at NeurIPS 2018 hosted 25 participants who submitted a team of 2 agents. Our submission nn_team_skynet955_skynet955 won 2nd place of the "learning agents'' category. Our team is composed of 2 neural networks trained with state of the art deep reinforcement learning algorithms and makes use of concepts like reward shaping, curriculum learning, and an automatic reasoning module for action pruning. Here, we describe these elements and additionally we present a collection of open-sourced agents that can be used for training and testing in the Pommerman environment. Code available at: https://github.com/BorealisAI/pommerman-baseline
Safer Deep RL with Shallow MCTS: A Case Study in Pommerman
Kartal, Bilal, Hernandez-Leal, Pablo, Gao, Chao, Taylor, Matthew E.
Safe reinforcement learning has many variants and it is still an open research problem. Here, we focus on how to use action guidance by means of a non-expert demonstrator to avoid catastrophic events in a domain with sparse, delayed, and deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for reinforcement learning (RL) --- past work has shown that model-free RL algorithms fail to achieve significant learning. In this paper, we shed light into the reasons behind this failure by exemplifying and analyzing the high rate of catastrophic events (i.e., suicides) that happen under random exploration in this domain. While model-free random exploration is typically futile, we propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with small number of rollouts, can be integrated to asynchronous distributed deep reinforcement learning methods. Compared to vanilla deep RL algorithms, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.
Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL
Kartal, Bilal, Hernandez-Leal, Pablo, Taylor, Matthew E.
Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. \emph{Terminal Prediction}, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.
Is multiagent deep reinforcement learning the answer or the question? A brief survey
Hernandez-Leal, Pablo, Kartal, Bilal, Taylor, Matthew E.
Deep reinforcement learning (DRL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. In this context, first, this article provides a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Second, it provides guidelines to complement this emerging area by (i) showcasing examples on how methods and algorithms from DRL and multiagent learning (MAL) have helped solve problems in MDRL and (ii) providing general lessons learned from these works. We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists in both areas (DRL and MAL) in a joint effort to promote fruitful research in the multiagent community.
Data Driven Sokoban Puzzle Generation with Monte Carlo Tree Search
Kartal, Bilal (University of Minnesota) | Sohre, Nick (University of Minnesota) | Guy, Stephen J. (University of Minnesota)
In this work, we propose a Monte Carlo Tree Search (MCTS) based approach to procedurally generate Sokoban puzzles. Our method generates puzzles through simulated game play, guaranteeing solvability in all generated puzzles. We perform a user study to infer features that are efficient to compute and are highly correlated with expected puzzle difficulty. We combine several of these features into a data-driven evaluation function for MCTS puzzle creation. The resulting algorithm is efficient and can be run in an anytime manner, capable of quickly generating a variety of challenging puzzles. We perform a second user study to validate the predictive capability of our approach, showing a high correlation between increasing puzzle scores and perceived difficulty.
Monte Carlo Tree Search for Multi-Robot Task Allocation
Kartal, Bilal (University of Minnesota) | Nunes, Ernesto (University of Minnesota) | Godoy, Julio (University of Minnesota) | Gini, Maria (University of Minnesota)
Multi-robot teams are useful in a variety of task allocation domains such as warehouse automation and surveillance. Robots in such domains perform tasks at given locations and specific times, and are allocated tasks to optimize given team objectives. We propose an efficient, satisficing and centralized Monte Carlo TreeSearch based algorithm exploiting branch and bound paradigm to solve the multi-robot task allocation problem with spatial, temporal and other side constraints. Unlike previous heuristics proposed for this problem, our approach offers theoretical guarantees and finds optimal solutions for some non-trivial data sets.
Generating Believable Stories in Large Domains
Kartal, Bilal (University of Minnesota) | Koenig, John (University of Minnesota) | Guy, Stephen J. (University of Minnesota)
Planning-based techniques are a very powerful tool for automated story generation. However, as the number of possible actions increases, traditional planning techniques suffer from a combinatorial explosion due to large branching factors. In this work, we apply Monte Carlo Tree Search (MCTS) techniques to generate stories in domains with large numbers of possible actions (100+). Our approach employs a Bayesian story evaluation method to guide the planning towards believable stories that reach a user defined goal. We generate stories in a novel domain with different type of story goals. Our approach shows an order of magnitude improvement in performance over traditional search techniques.