Action Guidance with MCTS for Deep Reinforcement Learning
Kartal, Bilal, Hernandez-Leal, Pablo, Taylor, Matthew E.
Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency. In this paper, we focus on how to use action guidance by means of a non-expert demonstrator to improve sample efficiency in a domain with sparse, delayed, and possibly deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods. Compared to a vanilla deep RL algorithm, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game. Introduction Deep reinforcement learning (DRL) has enabled better scalability and generalization for challenging domains (Arulku-maran et al. 2017; Li 2017; Hernandez-Leal, Kartal, and Taylor 2018) such as Atari games (Mnih et al. 2015), Go (Silver et al. 2016) and multiagent games (e.g., Starcraft II and DOT A 2) (OpenAI 2018). However, one of the current biggest challenges for DRL is sample efficiency (Y u 2018). On the one hand, once a DRL agent is trained, it can be deployed to act in real-time by only performing an inference through the trained model. On the other hand, planning methods such as Monte Carlo tree search (MCTS) (Browne et al. 2012) do not have a training phase, but they perform computationally costly simulation based rollouts (assuming access to a simulator) to find the best action to take. There are several ways to get the best of both DRL and search methods.
Jul-25-2019
- Country:
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)
- Technology: