MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation

Zhao, Ruihan, Ingebrand, Tyler, Chinchali, Sandeep, Topcu, Ufuk

Oct-21-2025–arXiv.org Artificial Intelligence

Vision-Language-Action (VLA) models trained on large robot datasets promise general-purpose, robust control across diverse domains and embodiments. However, existing approaches often fail out-of-the-box when deployed in novel environments, embodiments, or tasks. We introduce Mixture of Skills VLA (MoS-VLA), a framework that represents robot manipulation policies as linear combinations of a finite set of learned basis functions. During pretraining, MoS-VLA jointly learns these basis functions across datasets from the Open X-Embodiment project, producing a structured skill space. At test time, adapting to a new task requires only a single expert demonstration. The corresponding skill representation is then inferred via a lightweight convex optimization problem that minimizes the L1 action error, without requiring gradient updates. Empirically, MoS-VLA achieves lower action-prediction error on five out of five unseen datasets and succeeds in both simulation and real-robot tasks where a pretrained VLA model fails outright. Inspired by the success of large language models, modern robotics aims to achieve generalization and human-like performance through the use of internet-scale data and large, attention-based architectures. To this end, researchers have collected enormous datasets of robotic arm trajectories (Open X-Embodiment Collaboration et al., 2023) and trained so-called vision-language-action foundation models to map natural language task descriptions and state observations to robot actions (Kim et al., 2024; Octo Model Team et al., 2024; Brohan et al., 2023b;a; Ma et al., 2024).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-21-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas (0.14)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Natural Language > Large Language Model (0.67)
  - Representation & Reasoning > Optimization (0.66)
  - Machine Learning
    - Neural Networks (0.47)
    - Inductive Learning (0.46)
    - Statistical Learning > Gradient Descent (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found