VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions 1

Open in new window