Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
Shixiang (Shane) Gu, Timothy Lillicrap, Richard E. Turner, Zoubin Ghahramani, Bernhard Schölkopf, Sergey Levine
–Neural Information Processing Systems
Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on-and off-policy updates for deep reinforcement learning.
Neural Information Processing Systems
Nov-21-2025, 12:06:59 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- Germany > Baden-Württemberg
- Tübingen Region > Tübingen (0.14)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Germany > Baden-Württemberg
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Massachusetts > Middlesex County
- Burlington (0.04)
- California > Los Angeles County
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.93)
- Technology: