Policy Gradient with Active Importance Sampling