Review for NeurIPS paper: High-Throughput Synchronous Deep RL

May-31-2025, 19:27:11 GMT–Neural Information Processing Systems

The baselines are somehow weak. Though TorchBeast is a strong baseline, the PPO and A2C from Kostrikov seem weak. As far as I know, faster training is not the goal of Kostrikov's implementation. For PPO, the implementation from OpenAI baselines are stronger, which features parallelization with MPI and all-reduce gradients. For A2C, one could consider rlpyt (rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch), where various sampling schemes (including batch synchronization) and optimization schemes can be used.

batch synchronization, high-throughput synchronous deep rl, implementation, (9 more...)

Neural Information Processing Systems

May-31-2025, 19:27:11 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.61)
  - Reinforcement Learning (0.57)