Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

Neural Information Processing Systems 

In reinforcement learning problems, the aim is to select a controller that will maximize the average reward in some environment.