Generalist World Model Pre-Training for Efficient Reinforcement Learning

Zhao, Yi, Scannell, Aidan, Hou, Yuxin, Cui, Tianyu, Chen, Le, Büchler, Dieter, Solin, Arno, Kannala, Juho, Pajarinen, Joni

Feb-26-2025–arXiv.org Artificial Intelligence

Sample-efficient robot learning is a longstanding goal in robotics. Inspired by the success of scaling in vision and language, the robotics community is now investigating large-scale offline datasets for robot learning. However, existing methods often require expert and/or reward-labeled task-specific data, which can be costly and limit their application in practice. In this paper, we consider a more realistic setting where the offline data consists of reward-free and non-expert multi-embodiment offline data. We show that generalist world model pre-training (WPT), together with retrieval-based experience rehearsal and execution guidance, enables efficient reinforcement learning (RL) and fast task adaptation with such non-curated data. In experiments over 72 visuomotor tasks, spanning 6 different embodiments, covering hard exploration, complex dynamics, and various visual properties, WPT achieves 35.65% and 35% higher aggregated score compared to widely used learning-from-scratch baselines, respectively.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

Feb-26-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- North America > Canada
  - Alberta (0.14)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Education > Educational Setting (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Robots (1.00)