Action-depedent Control Variates for Policy Optimization via Stein's Identity

Open in new window