Get Back Here: Robust Imitation by Return-to-Distribution Planning

Cideron, Geoffrey, Tabanpour, Baruch, Curi, Sebastian, Girgin, Sertan, Hussenot, Leonard, Dulac-Arnold, Gabriel, Geist, Matthieu, Pietquin, Olivier, Dadashi, Robert

arXiv.org Artificial Intelligence 

Imitation Learning (IL) is a paradigm in sequential decision making where an agent uses offline expert trajectories to mimic the expert's behavior [1]. While Reinforcement Learning (RL) requires an additional reward signal that can be hard to specify in practice, IL only requires expert trajectories that can be easier to collect. In part due to its simplicity, IL has been applied successfully in several real world tasks, from robotic manipulation [2, 3, 4] to autonomous driving [5, 6]. A key challenge in deploying IL, however, is that the agent may encounter states in the final deployment environment that were not labeled by the expert offline [7]. In applications such as healthcare [8, 9] and robotics [10, 11], online experimentation can be risky (e.g., on human patients) or costly to label (e.g., off-policy robotic datasets can take months to collect).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found