An Information-Theoretic Optimality Principle for Deep Reinforcement Learning
Leibfried, Felix, Grau-Moya, Jordi, Bou-Ammar, Haitham
We methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari games, our algorithm outperforms other algorithms (e.g.
Feb-8-2018
- Country:
- North America > United States (0.29)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.56)
- Technology: