Monte-Carlo Tree Search for Policy Optimization

Ma, Xiaobai, Driggs-Campbell, Katherine, Zhang, Zongzhang, Kochenderfer, Mykel J.

Dec-23-2019–arXiv.org Artificial Intelligence

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

neural network, planning & scheduling, upstream oil & gas, (21 more...)

arXiv.org Artificial Intelligence

Dec-23-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.48)

Genre:
- Research Report (1.00)

Industry:
- Energy > Oil & Gas > Upstream (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Representation & Reasoning
    - Search (1.00)
    - Planning & Scheduling (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found