Variance Reduction Based Experience Replay for Policy Optimization