Reviews: Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
–Neural Information Processing Systems
The authors propose a new policy gradient framework that unifies many previous on-policy and off-policy gradient methods. Many previous policy gradient algorithms can not only be re-derived and but also get improved by the introduction of control variate. Even though this framework introduces bias to gradient updates but, they show in theoretical results that this bias can be bounded. Experiments are very well done and provide enough insights to understand the proposed framework.
deep reinforcement learning, interpolated policy gradient, on-policy and off-policy gradient estimation, (5 more...)
Neural Information Processing Systems
Oct-8-2024, 05:23:04 GMT
- Technology: