RoboBrain 2.0 Technical Report

BAAI RoboBrain Team, null, Cao, Mingyu, Tan, Huajie, Ji, Yuheng, Chen, Xiansheng, Lin, Minglan, Li, Zhiyu, Cao, Zhou, Wang, Pengwei, Zhou, Enshen, Han, Yi, Tang, Yingbo, Xu, Xiangqi, Guo, Wei, Lyu, Yaoxu, Xu, Yijie, Shi, Jiayu, Du, Mengfei, Chi, Cheng, Zhao, Mengdi, Hao, Xiaoshuai, Zhao, Junkai, Zhang, Xiaojie, Rong, Shanyu, Lyu, Huaihai, Cai, Zhengliang, Fu, Yankai, Chen, Ning, Zhang, Bolun, Zhang, Lingfeng, Zhang, Shuyi, Liu, Dong, Feng, Xi, Wang, Songjing, Liu, Xiaodan, Jiao, Yance, Lyu, Mengsi, Chen, Zhuo, He, Chenrui, Ao, Yulong, Sun, Xue, He, Zheqi, Zheng, Jingshu, Yang, Xi, Shi, Donghai, Xie, Kunchang, Zhang, Bochao, Nie, Shaokai, Men, Chunlei, Lin, Yonghua, Wang, Zhongyuan, Huang, Tiejun, Zhang, Shanghang

Sep-16-2025–arXiv.org Artificial Intelligence

We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain 2.0 achieves strong performance across a wide spectrum of embodied reasoning tasks. On both spatial and temporal benchmarks, the 32B variant achieves leading results, surpassing prior open-source and proprietary models. In particular, it supports key real-world embodied AI capabilities, including spatial understanding (e.g., affordance prediction, spatial referring, trajectory forecasting) and temporal decision-making (e.g., closed-loop interaction, multi-agent long-horizon planning, and scene graph updating). This report details the model architecture, data construction, multi-stage training strategies, infrastructure and practical applications. We hope RoboBrain 2.0 advances embodied AI research and serves as a practical step toward building generalist embodied agents. The code, checkpoint and benchmark are available at https://superrobobrain.github.io.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Sep-16-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Workflow (0.67)

Industry:
- Consumer Products & Services (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots (1.00)
  - Representation & Reasoning > Agents (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)