Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Gu, Shixiang (Shane), Lillicrap, Timothy, Turner, Richard E., Ghahramani, Zoubin, Schölkopf, Bernhard, Levine, Sergey

Feb-14-2020, 13:58:52 GMT–Neural Information Processing Systems

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family.

deep reinforcement learning, interpolated policy gradient, on-policy and off-policy gradient estimation, (4 more...)

Neural Information Processing Systems

Feb-14-2020, 13:58:52 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)