On the Convergence of Discounted Policy Gradient Methods