Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

Anthony, Thomas, Nishihara, Robert, Moritz, Philipp, Salimans, Tim, Schulman, John

Apr-7-2019–arXiv.org Machine Learning

Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with Monte Carlo averages, stored in a search tree; this does not scale to games with very high branching factors. We propose an alternative simulation-based search method, Policy Gradient Search (PGS), which adapts a neural network simulation policy online via policy gradient updates, avoiding the need for a search tree. In Hex, PGS achieves comparable performance to MCTS, and an agent trained using Expert Iteration with PGS was able defeat MoHex 2.0, the strongest open-source Hex agent, in 9x9 Hex.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

Apr-7-2019

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > California
    - Alameda County > Berkeley (0.04)
  - Canada
    - Alberta (0.14)
    - Quebec > Montreal (0.04)
- Europe > Netherlands
  - North Holland > Amsterdam (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.82)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.68)
    - Statistical Learning > Gradient Descent (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found