On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

May-23-2025, 12:42:08 GMT–Neural Information Processing Systems

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

May-23-2025, 12:42:08 GMT

Conferences PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.14)
- North America > United States
  - California > San Francisco County > San Francisco (0.14)

Genre:
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.34)

Industry:
- Education > Educational Setting
  - Online (0.52)
- Health & Medicine > Diagnostic Medicine (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Similar Docs Excel Report more

Title	Similarity	Source
None found