8c0fabe372177d2aded596be2d3b4544-Paper-Conference.pdf

Jun-19-2026, 14:00:37 GMT–Neural Information Processing Systems

Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Jun-19-2026, 14:00:37 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found