Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration