Reviews: Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Neural Information Processing Systems 

The authors propose a new policy gradient framework that unifies many previous on-policy and off-policy gradient methods. Many previous policy gradient algorithms can not only be re-derived and but also get improved by the introduction of control variate. Even though this framework introduces bias to gradient updates but, they show in theoretical results that this bias can be bounded. Experiments are very well done and provide enough insights to understand the proposed framework.