Learning General World Models in a Handful of Reward-Free Deployments Aldo Pacchiano
–Neural Information Processing Systems
Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research.
Neural Information Processing Systems
Mar-27-2025, 11:33:47 GMT