A Policy Gradient Method with Variance Reduction for Uplift Modeling