Scalable Coordinated Exploration in Concurrent Reinforcement Learning
Maria Dimakopoulou, Ian Osband, Benjamin Van Roy
–Neural Information Processing Systems
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling[1] and randomized value function learning [11]. We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods [1]. With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.
Neural Information Processing Systems
Oct-7-2024, 06:22:47 GMT