Analysis and Improvement of Policy Gradient Estimation