Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration

Zhao, Heyang, Yu, Xingrui, Bossens, David M., Tsang, Ivor W., Gu, Quanquan

Jun-26-2025–arXiv.org Artificial Intelligence

Imitation learning is a central problem in reinforcement learning where the goal is to learn a policy that mimics the expert's behavior. In practice, it is often challenging to learn the expert policy from a limited number of demonstrations accurately due to the complexity of the state space. Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work. We also provide a theoretical justification of ILDE as an uncertainty-regularized policy optimization method with optimistic exploration, leading to a regret growing sublinearly in the number of episodes.

demonstration, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Jun-26-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- North America > United States
  - California (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Leisure & Entertainment > Games (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning > Expert Systems (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Reinforcement Learning (1.00)
    - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found