Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories
Yang, Rushuai, Feng, Zhiyuan, Zhang, Tianxiang, Wang, Kaixin, Zhang, Chuheng, Zhao, Li, Su, Xiu, Chen, Yi, Bian, Jiang
–arXiv.org Artificial Intelligence
Scaling vision-language-action (VLA) model pre-training requires large volumes of diverse, high-quality manipulation trajectories. Most current data is obtained via human teleoperation, which is expensive and difficult to scale. Reinforcement learning (RL) methods learn useful skills through autonomous exploration, making them a viable approach for generating data. However, standard RL training collapses to a narrow execution pattern, limiting its utility for large-scale pre-training. We propose Discover, Lea rn and Reinforce (DLR), an information-theoretic pattern discovery framework that generates multiple distinct, high-success behavioral patterns for VLA pretraining. Empirically, DLR generates a markedly more diverse trajectory corpus on LIBERO. Specifically, it learns multiple distinct, high-success strategies for the same task where standard RL discovers only one, and hence it covers substantially broader regions of the state-action space. When adapted to unseen downstream task suites, VLA models pretrained on our diverse RL data surpass counterparts trained on equal-sized standard RL datasets. Moreover, DLR exhibits positive data-scaling behavior that single-pattern RL lacks. These results position multi-pattern RL as a practical, scalable data engine for embodied foundation models.
arXiv.org Artificial Intelligence
Nov-26-2025
- Country:
- Asia
- China
- Hong Kong (0.04)
- Hubei Province > Wuhan (0.04)
- South Korea > Daegu
- Daegu (0.04)
- China
- Europe > Netherlands
- South Holland > Delft (0.04)
- North America > United States
- Massachusetts
- Middlesex County > Cambridge (0.04)
- Suffolk County > Boston (0.04)
- Massachusetts
- Asia
- Genre:
- Research Report (0.64)
- Technology: