Deep Exploration via Bootstrapped DQN
Osband, Ian, Blundell, Charles, Pritzel, Alexander, Roy, Benjamin Van
–Neural Information Processing Systems
Efficient exploration remains a major challenge for reinforcement learning (RL). Common dithering strategies for exploration, such as epsilon-greedy, do not carry out temporally-extended (or deep) exploration; this can lead to exponentially larger data requirements. However, most algorithms for statistically efficient RL are not computationally tractable in complex environments. Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. As a first step towards addressing such contexts we develop bootstrapped DQN.
Neural Information Processing Systems
Feb-14-2020, 15:26:12 GMT