ACG: Action Coherence Guidance for Flow-based VLA models

Park, Minho, Kim, Kinam, Hyung, Junha, Jang, Hyojin, Jin, Hoiyeong, Yun, Jooyeol, Lee, Hojoon, Choo, Jaegul

arXiv.org Artificial Intelligence 

Abstract-- Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Y et, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Diffusion and flow matching models are reshaping how robots learn to manipulate objects [1]. These generative models act as robot policies that directly model complex action distributions from human demonstrations, enabling strong generalization across diverse manipulation tasks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found