Deep Exploration via Bootstrapped DQN
Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
–Neural Information Processing Systems
E cient exploration remains a major challenge for reinforcement learning (RL). Common dithering strategies for exploration, such as '-greedy, do not carry out temporally-extended (or deep) exploration; this can lead to exponentially larger data requirements. However, most algorithms for statistically e cient RL are not computationally tractable in complex environments. Randomized value functions o er a promising approach to e cient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. As a first step towards addressing such contexts we develop bootstrapped DQN. We demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy. In the Arcade Learning Environment bootstrapped DQN substantially improves learning speed and cumulative performance across most games.
Neural Information Processing Systems
Jan-20-2025, 15:26:45 GMT
- Genre:
- Research Report (0.67)
- Industry:
- Education (0.49)
- Leisure & Entertainment > Games (0.47)
- Technology: