A PPO Proximal Policy Optimization (PPO) [

Neural Information Processing Systems 

Tasks from left to right: Finger Spin, Cheetah Run, Walker Walk.