scalable efficient deep-rl
Scalable Efficient Deep-RL
Traditional scalable reinforcement learning framework, such as IMPALA and R2D2, runs multiple agents in parallel to collect transitions, each with its own copy of model from the parameter server(or learner). This architecture imposes high bandwidth requirements since they demand transfers of model parameters, environment information and etc. In this article, we discuss a modern scalable RL agent called SEED(Scalable Efficient Deep-RL), proposed by Espeholt&Marinier&Stanczyk et al in Google Brain team. Here we compare SEED with IMPALA. The IMPALA architecture, which is also used in various forms in Ape-X, OpenAI Rapid and etc., mainly consists of two parts: A large number of actors periodically copy model parameters from the learner, and interact with environments to collect trajectories, while the learner(s) asynchronously receives transitions from the actors and optimizes its model.