Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

Li, Zhuo, Liu, Junjia, Dong, Zhipeng, Teng, Tao, Rouxel, Quentin, Caldwell, Darwin, Chen, Fei

Nov-19-2025–arXiv.org Artificial Intelligence

However, pre-trained VLA policies still suffer from substantial performance degradation during downstream deployment. Although fine-tuning can mitigate this issue, its reliance on costly demonstration collection and intensive computation makes it impractical in real-world settings. In this work, we introduce VLA-Pilot, a plug-and-play inference-time policy steering method for zero-shot deployment of pre-trained VLA without any additional fine-tuning or data collection. We evaluate VLA-Pilot on six real-world downstream manipulation tasks across two distinct robotic embodiments, encompassing both in-distribution and out-of-distribution scenarios. Experimental results demonstrate that VLA-Pilot substantially boosts the success rates of off-the-shelf pre-trained VLA policies, enabling robust zero-shot generalization to diverse tasks and embodiments. Experimental videos and code are available at: https://rip4kobe.github.io/vla-pilot/. I. INTRODUCTION Recent advances in VLA models have substantially improved the generalization capabilities of robotic manipulation. By learning from large-scale demonstrations [1], these generative foundation policies enable robots to acquire a wide repertoire of skills. At inference time, they can perform diverse and contextually appropriate tasks by stochastically sampling actions from the learned skill distribution.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

Nov-19-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Hong Kong (0.05)
- Europe > Italy
  - Liguria > Genoa (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.88)
  - Robots (1.00)