Sample Complexity Bounds for Iterative Stochastic Policy Optimization