Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes