An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

Leibfried, Felix, Grau-Moya, Jordi, Bou-Ammar, Haitham

Feb-8-2018–arXiv.org Machine Learning

We methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari games, our algorithm outperforms other algorithms (e.g.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

Feb-8-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.29)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.56)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found