Generalist World Model Pre-Training for Efficient Reinforcement Learning
Zhao, Yi, Scannell, Aidan, Hou, Yuxin, Cui, Tianyu, Chen, Le, Büchler, Dieter, Solin, Arno, Kannala, Juho, Pajarinen, Joni
–arXiv.org Artificial Intelligence
Sample-efficient robot learning is a longstanding goal in robotics. Inspired by the success of scaling in vision and language, the robotics community is now investigating large-scale offline datasets for robot learning. However, existing methods often require expert and/or reward-labeled task-specific data, which can be costly and limit their application in practice. In this paper, we consider a more realistic setting where the offline data consists of reward-free and non-expert multi-embodiment offline data. We show that generalist world model pre-training (WPT), together with retrieval-based experience rehearsal and execution guidance, enables efficient reinforcement learning (RL) and fast task adaptation with such non-curated data. In experiments over 72 visuomotor tasks, spanning 6 different embodiments, covering hard exploration, complex dynamics, and various visual properties, WPT achieves 35.65% and 35% higher aggregated score compared to widely used learning-from-scratch baselines, respectively.
arXiv.org Artificial Intelligence
Feb-26-2025
- Country:
- Europe (0.28)
- North America > Canada
- Alberta (0.14)
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Education > Educational Setting (0.67)
- Technology: