Review for NeurIPS paper: High-Throughput Synchronous Deep RL

Neural Information Processing Systems 

The baselines are somehow weak. Though TorchBeast is a strong baseline, the PPO and A2C from Kostrikov seem weak. As far as I know, faster training is not the goal of Kostrikov's implementation. For PPO, the implementation from OpenAI baselines are stronger, which features parallelization with MPI and all-reduce gradients. For A2C, one could consider rlpyt (rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch), where various sampling schemes (including batch synchronization) and optimization schemes can be used.