Goto

Collaborating Authors

 Ian Osband



Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Neural Information Processing Systems

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling[1] and randomized value function learning [11]. We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods [1]. With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.


Deep Exploration via Bootstrapped DQN

Neural Information Processing Systems

E cient exploration remains a major challenge for reinforcement learning (RL). Common dithering strategies for exploration, such as '-greedy, do not carry out temporally-extended (or deep) exploration; this can lead to exponentially larger data requirements. However, most algorithms for statistically e cient RL are not computationally tractable in complex environments. Randomized value functions o er a promising approach to e cient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. As a first step towards addressing such contexts we develop bootstrapped DQN. We demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy. In the Arcade Learning Environment bootstrapped DQN substantially improves learning speed and cumulative performance across most games.


Randomized Prior Functions for Deep Reinforcement Learning

Neural Information Processing Systems

Dealing with uncertainty is essential for e cient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorlysuited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable'prior' network to each ensemble member. We prove that this approach is e cient with linear representations, provide simple illustrations of its e cacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.


Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Neural Information Processing Systems

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling[1] and randomized value function learning [11]. We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods [1]. With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.