Learning to Constrain Policy Optimization with Virtual Trust Region