Goto

Collaborating Authors

 xr-1


XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations

Fan, Shichao, Wu, Kun, Che, Zhengping, Wang, Xinhua, Wu, Di, Liao, Fei, Liu, Ning, Zhang, Yixue, Zhao, Zhen, Xu, Zhiyuan, Li, Meng, Liu, Qingjie, Zhang, Shanghang, Wan, Min, Tang, Jian

arXiv.org Artificial Intelligence

Recent progress in large-scale robotic datasets and vision-language models (VLMs) has advanced research on vision-language-action (VLA) models. However, existing VLA models still face two fundamental challenges: (i) producing precise low-level actions from high-dimensional observations, (ii) bridging domain gaps across heterogeneous data sources, including diverse robot embodiments and human demonstrations. Existing methods often encode latent variables from either visual dynamics or robotic actions to guide policy learning, but they fail to fully exploit the complementary multi-modal knowledge present in large-scale, heterogeneous datasets. In this work, we present X Robotic Model 1 (XR-1), a novel framework for versatile and scalable VLA learning across diverse robots, tasks, and environments. XR-1 introduces the \emph{Unified Vision-Motion Codes (UVMC)}, a discrete latent representation learned via a dual-branch VQ-VAE that jointly encodes visual dynamics and robotic motion. UVMC addresses these challenges by (i) serving as an intermediate representation between the observations and actions, and (ii) aligning multimodal dynamic information from heterogeneous data sources to capture complementary knowledge. To effectively exploit UVMC, we propose a three-stage training paradigm: (i) self-supervised UVMC learning, (ii) UVMC-guided pretraining on large-scale cross-embodiment robotic datasets, and (iii) task-specific post-training. We validate XR-1 through extensive real-world experiments with more than 14,000 rollouts on six different robot embodiments, spanning over 120 diverse manipulation tasks. XR-1 consistently outperforms state-of-the-art baselines such as $π_{0.5}$, $π_0$, RDT, UniVLA, and GR00T-N1.5 while demonstrating strong generalization to novel objects, background variations, distractors, and illumination changes. Our project is at https://xr-1-vla.github.io/.


MWC19 Los Angeles: First-ever humanoid robot powered by cloud artificial intelligence

#artificialintelligence

Who needs to use that delicate tiny sewing staple, when there's now a robot that can thread a needle for you? The XR-1 robot is powered by cloud artificial intelligence (AI)--one of the first of its kind--Sprint True Mobile 5G, and proprietary vision-controlled grasping tech, which means it not only can thread a needle, but can serve drinks and can be programmed to do other tasks, including manufacturing. The revolutionary XR-1 robot is a service robot, which also leverages human operator input for constant learning. "Overall, intelligent cloud robots paint the most vibrant picture of how 5G's ultra-low latency, exponentially faster speeds, and wider reach can dramatically improve response time and enable a new world of applications," said Bill Huang, founder and CEO of CloudMinds, in a release. CloudMinds' Virtual Backbone Network (VBN) combines high-performance, low-latency fixed, and mobile-network technology; blockchain technologies; and other innovations to manage cloud robotics through connectivity completely isolated from the internet, guaranteeing security.


CloudMinds XR-1: One of the First Intelligent 5G Humanoid Robots Awakens with Sprint at MWC Los Angeles 2019

#artificialintelligence

WIRE)--CloudMinds Technology Inc. – a global pioneer in cloud artificial intelligence architecture that makes robots and businesses smarter for the benefit of all humanity – will have its revolutionary XR-1 robot interact with guests at the Sprint exhibit (South Hall #1702) at Mobile World Congress Los Angeles, Oct. 22 to 24. XR-1 is one of the first-ever humanoid robots powered by cloud artificial intelligence, commercial Sprint True Mobile 5G and proprietary vision-controlled grasping technology for service robots that also leverages human operator input for constant learning. "Overall, intelligent cloud robots paint the most vibrant picture of how 5G's ultra-low latency, exponentially faster speeds and wider reach can dramatically improve response time and enable a new world of applications," said Bill Huang, founder and CEO of CloudMinds. "With vision-controlled grasping and the ability to perform intricate tasks, the XR-1 simply raises the bar and lays the foundation for an even wider range of intelligent compliant cloud service robots from CloudMinds – from wheeled to two-legged form factors. We are proud to be ushering in a new era of helpful robots for homes and businesses, with an emphasis on the importance of human input."