Reviews: Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Neural Information Processing Systems 

Main ideas of the submission The authors investigate the problem of efficient coordinated concurrent exploration in environments too large to be addressed by tabular, model-based methods. This is a continuation of [1], where the principles of seed sampling were developed for efficient coordinated concurrent exploration, using a tabular model based algorithm. Since the algorithm was only tested on trivial tasks in [1], the authors first demonstrate the effectiveness of this tabular method on a more challenging problem (swinging up and balancing a pole), compared to trivial extensions of known methods (UCB, Posterior sampling) to the concurrent setting. Following that, they suggest a model-free extension to seeding that is based on function approximation with randomized value functions [9] – a concept that facilitates the combination of the seeding principle with generalization. The authors also suggest some concrete algorithms (SLSVI, STD) that support this concept, show that its performance on the trivial examples of [1] is comparable to that of tabular seed sampling, and show its effectiveness on another pole-balancing problem, which is too difficult to be addressed by tabular methods.