Zero-Shot Offline Imitation Learning via Optimal Transport

Rupf, Thomas, Bagatella, Marco, Gürtler, Nico, Frey, Jonas, Martius, Georg

Oct-11-2024–arXiv.org Artificial Intelligence

Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-11-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - Austria > Vienna (0.14)
  - Switzerland > Zürich
    - Zürich (0.14)
- North America > United States
  - Hawaii (0.14)
  - Maryland (0.14)

Genre:
- Research Report > Promising Solution (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.92)
  - Robots (1.00)