Review for NeurIPS paper: POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
–Neural Information Processing Systems
Correctness: The discussion on baseline's for POMO to me are a bit misleading. This is somewhat of a nit though. First, the use of "traditionally" is incorrect. Earliest work (including the REINFORCE paper if I recall correctly) make use of a rolling average baseline. Newer works do use more complicated baselines, but for a reason!
Neural Information Processing Systems
Feb-8-2025, 03:23:07 GMT
- Technology: