A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories
Yan, Kai, Schwing, Alexander G., Wang, Yu-Xiong
–arXiv.org Artificial Intelligence
Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.
arXiv.org Artificial Intelligence
Nov-2-2023
- Country:
- North America > United States > Illinois (0.14)
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks (1.00)
- Reinforcement Learning (0.68)
- Statistical Learning (0.93)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Robots (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence