KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation without Robot Data

Open in new window