Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy

Open in new window