XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations

Open in new window