Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods