KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation without Robot Data