Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models

Zhang, Hongyin, Zhang, Shiyuan, Jin, Junxi, Zeng, Qixin, Qiao, Yifan, Lu, Hongchao, Wang, Donglin

Sep-5-2025–arXiv.org Artificial Intelligence

Vision-Language-Action (VLA) models based on flow matching have shown excellent performance in general-purpose robotic manipulation tasks. However, the action accuracy of these models on complex downstream tasks is unsatisfactory. One important reason is that these models rely solely on the post-training paradigm of imitation learning, which makes it difficult to have a deeper understanding of the distribution properties of data quality, which is exactly what Reinforcement Learning (RL) excels at. In this paper, we theoretically propose an offline RL post-training objective for VLA flow models and induce an efficient and feasible offline RL fine-tuning algorithm -- Adaptive Reinforced Flow Matching (ARFM). By introducing an adaptively adjusted scaling factor in the VLA flow model loss, we construct a principled bias-variance trade-off objective function to optimally control the impact of RL signal on flow loss. ARFM adaptively balances RL advantage preservation and flow loss gradient variance control, resulting in a more stable and efficient fine-tuning process. Extensive simulation and real-world experimental results show that ARFM exhibits excellent generalization, robustness, few-shot learning, and continuous learning performance.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

Sep-5-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)
- North America > United States
  - California (0.28)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Education > Educational Setting > Continuing Education (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning > Reinforcement Learning (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found