Regret Minimization for Partially Observable Deep Reinforcement Learning

Jin, Peter, Keutzer, Kurt, Levine, Sergey

Oct-24-2018–arXiv.org Artificial Intelligence

Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.

computer game, deep learning, regret minimization, (17 more...)

arXiv.org Artificial Intelligence

Oct-24-2018

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Alberta (0.14)
  - United States > California (0.14)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.36)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found