POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
–Neural Information Processing Systems
Empirically, the low-variance baseline of POMO makes RL training fast and stable, and it is more resistant to local minima compared to previous approaches.
Neural Information Processing Systems
Aug-17-2025, 06:17:27 GMT