Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Open in new window