41128e5b3a7622da5b17588757599077-Paper-Conference.pdf

Jun-16-2026, 18:56:39 GMT–Neural Information Processing Systems

In this work, we introduce ChatVLA-2, a novel mixture-ofexpert VLA model coupled with a specialized two-stage training pipeline designed to preserve the VLM's original strengths while enabling actionable reasoning. To validate our approach, we design a math-matching task wherein a robot interprets math problems written on a whiteboard and picks corresponding number cards from a table to solve equations. Remarkably, our method exhibits exceptional mathematical reasoning and OCR capabilities, despite these abilities not being explicitly trained within the VLA. Furthermore, we demonstrate that the VLA possesses strong spatial reasoning skills, enabling it to interpret novel directional instructions involving previously unseen objects. Overall, our method showcases reasoning and comprehension abilities that significantly surpass state-of-the-art imitation learning methods such as OpenVLA, DexVLA, and π0. This work represents a substantial advancement toward developing truly generalizable robotic foundation models endowed with robust reasoning capacities.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Jun-16-2026, 18:56:39 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.46)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.47)
  - Machine Learning > Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found