Goto

Collaborating Authors

 scalable coordinated exploration


Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Neural Information Processing Systems

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on the seed sampling concept introduced in Dimakopoulou and Van Roy (2018) and on a randomized value function learning algorithm from Osband et al. (2016). We demonstrate that, for simple tabular contexts, the approach is competitive with those previously proposed in Dimakopoulou and Van Roy (2018) and with a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.


Reviews: Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Neural Information Processing Systems

Main ideas of the submission The authors investigate the problem of efficient coordinated concurrent exploration in environments too large to be addressed by tabular, model-based methods. This is a continuation of [1], where the principles of seed sampling were developed for efficient coordinated concurrent exploration, using a tabular model based algorithm. Since the algorithm was only tested on trivial tasks in [1], the authors first demonstrate the effectiveness of this tabular method on a more challenging problem (swinging up and balancing a pole), compared to trivial extensions of known methods (UCB, Posterior sampling) to the concurrent setting. Following that, they suggest a model-free extension to seeding that is based on function approximation with randomized value functions [9] – a concept that facilitates the combination of the seeding principle with generalization. The authors also suggest some concrete algorithms (SLSVI, STD) that support this concept, show that its performance on the trivial examples of [1] is comparable to that of tabular seed sampling, and show its effectiveness on another pole-balancing problem, which is too difficult to be addressed by tabular methods.


Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Neural Information Processing Systems

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on the seed sampling concept introduced in Dimakopoulou and Van Roy (2018) and on a randomized value function learning algorithm from Osband et al. (2016). We demonstrate that, for simple tabular contexts, the approach is competitive with those previously proposed in Dimakopoulou and Van Roy (2018) and with a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes. Papers published at the Neural Information Processing Systems Conference.


Scalable Coordinated Exploration in Concurrent Reinforcement Learning

arXiv.org Artificial Intelligence

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value function learning (Osband et al., 2016). We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods (Dimakopoulou and Van Roy, 2018). With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.