Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization

Open in new window