8c0fabe372177d2aded596be2d3b4544-Paper-Conference.pdf
–Neural Information Processing Systems
Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io.
Neural Information Processing Systems
Jun-19-2026, 14:00:37 GMT