Generalizable Imitation Learning Through Pre-Trained Representations

Chang, Wei-Di, Hogan, Francois, Meger, David, Dudek, Gregory

Nov-15-2023–arXiv.org Artificial Intelligence

In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abilities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-15-2023

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.14)
- Oceania > New Zealand (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (1.00)
    - Statistical Learning (0.68)
  - Natural Language > Text Processing (0.67)
  - Robots (1.00)