Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization

Open in new window