Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

Open in new window