Get Back Here: Robust Imitation by Return-to-Distribution Planning
Cideron, Geoffrey, Tabanpour, Baruch, Curi, Sebastian, Girgin, Sertan, Hussenot, Leonard, Dulac-Arnold, Gabriel, Geist, Matthieu, Pietquin, Olivier, Dadashi, Robert
–arXiv.org Artificial Intelligence
Imitation Learning (IL) is a paradigm in sequential decision making where an agent uses offline expert trajectories to mimic the expert's behavior [1]. While Reinforcement Learning (RL) requires an additional reward signal that can be hard to specify in practice, IL only requires expert trajectories that can be easier to collect. In part due to its simplicity, IL has been applied successfully in several real world tasks, from robotic manipulation [2, 3, 4] to autonomous driving [5, 6]. A key challenge in deploying IL, however, is that the agent may encounter states in the final deployment environment that were not labeled by the expert offline [7]. In applications such as healthcare [8, 9] and robotics [10, 11], online experimentation can be risky (e.g., on human patients) or costly to label (e.g., off-policy robotic datasets can take months to collect).
arXiv.org Artificial Intelligence
May-2-2023
- Genre:
- Research Report (0.82)
- Industry:
- Automobiles & Trucks (0.48)
- Education (0.46)
- Information Technology > Robotics & Automation (0.34)
- Transportation > Ground
- Road (0.48)
- Technology: