Reusing Trajectories in Policy Gradients Enables Fast Convergence