MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation
Zhao, Ruihan, Ingebrand, Tyler, Chinchali, Sandeep, Topcu, Ufuk
–arXiv.org Artificial Intelligence
Vision-Language-Action (VLA) models trained on large robot datasets promise general-purpose, robust control across diverse domains and embodiments. However, existing approaches often fail out-of-the-box when deployed in novel environments, embodiments, or tasks. We introduce Mixture of Skills VLA (MoS-VLA), a framework that represents robot manipulation policies as linear combinations of a finite set of learned basis functions. During pretraining, MoS-VLA jointly learns these basis functions across datasets from the Open X-Embodiment project, producing a structured skill space. At test time, adapting to a new task requires only a single expert demonstration. The corresponding skill representation is then inferred via a lightweight convex optimization problem that minimizes the L1 action error, without requiring gradient updates. Empirically, MoS-VLA achieves lower action-prediction error on five out of five unseen datasets and succeeds in both simulation and real-robot tasks where a pretrained VLA model fails outright. Inspired by the success of large language models, modern robotics aims to achieve generalization and human-like performance through the use of internet-scale data and large, attention-based architectures. To this end, researchers have collected enormous datasets of robotic arm trajectories (Open X-Embodiment Collaboration et al., 2023) and trained so-called vision-language-action foundation models to map natural language task descriptions and state observations to robot actions (Kim et al., 2024; Octo Model Team et al., 2024; Brohan et al., 2023b;a; Ma et al., 2024).
arXiv.org Artificial Intelligence
Oct-21-2025
- Country:
- North America > United States > Texas (0.14)
- Genre:
- Research Report (0.64)
- Technology:
- Information Technology > Artificial Intelligence
- Robots (1.00)
- Natural Language > Large Language Model (0.67)
- Representation & Reasoning > Optimization (0.66)
- Machine Learning
- Neural Networks (0.47)
- Inductive Learning (0.46)
- Statistical Learning > Gradient Descent (0.34)
- Information Technology > Artificial Intelligence