Learning Optimal Deterministic Policies with Stochastic Policy Gradients