Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search

May-22-2020–arXiv.org Artificial Intelligence

The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. In this paper, we describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards (which is the case in many optimization problems), 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search. We gauge the effectiveness of our method in "SameGame"---a popular single-player test domain. Our experimental results indicate that our method outperforms baseline algorithms on several board sizes. Additionally, it is competitive with state-of-the-art search algorithms on a public set of positions.

algorithm, artificial intelligence, planning & scheduling, (18 more...)

arXiv.org Artificial Intelligence

May-22-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - United States (0.04)
  - Canada > Alberta
    - Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Africa > Middle East
  - Djibouti > Arta > `Arta (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.68)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning
  - Search (1.00)
  - Planning & Scheduling (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found