Coherent Soft Imitation Learning Joe Watson Sandy H. Huang Nicolas Heess

Mar-20-2025, 06:58:57 GMT–Neural Information Processing Systems

Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing between BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as well as additional access to the Markov decision process. Hybrid strategies that combine BC and IRL are rare, as initial policy optimization against inaccurate rewards diminishes the benefit of pretraining the policy with BC. This work derives an imitation method that captures the strengths of both BC and IRL.

demonstration, learning, regularization, (14 more...)

Neural Information Processing Systems

Mar-20-2025, 06:58:57 GMT

Conferences PDF

Add feedback

Duplicate Docs Excel Report

Title
2f0435cffef91068ced08d7c7d8e643e-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found