Generalizable Imitation Learning Through Pre-Trained Representations
Chang, Wei-Di, Hogan, Francois, Meger, David, Dudek, Gregory
–arXiv.org Artificial Intelligence
In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abilities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.
arXiv.org Artificial Intelligence
Nov-15-2023
- Country:
- North America > Canada (0.14)
- Oceania > New Zealand (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks (1.00)
- Statistical Learning (0.68)
- Natural Language > Text Processing (0.67)
- Robots (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence