V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

Open in new window