Review for NeurIPS paper: POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

Neural Information Processing Systems 

Correctness: The discussion on baseline's for POMO to me are a bit misleading. This is somewhat of a nit though. First, the use of "traditionally" is incorrect. Earliest work (including the REINFORCE paper if I recall correctly) make use of a rolling average baseline. Newer works do use more complicated baselines, but for a reason!