Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Neural Information Processing Systems 

Neural Information Processing Systems http://nips.cc/