VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions

Neural Information Processing Systems 

These skills are refined and updated through an iterative comparison strategy, enabling efficient adaptation to unseen environments.