Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy