Learning General World Models in a Handful of Reward-Free Deployments

Jan-18-2025, 12:28:28 GMT–Neural Information Processing Systems

Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research. CASCADE seeks to learn a world model by collecting data with a population of agents, using an information theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this by specifically maximizing the diversity of trajectories sampled by the population through a novel cascading objective.

cascade, learning general world model, reward-free deployment, (4 more...)

Neural Information Processing Systems

Jan-18-2025, 12:28:28 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Problem Solving (0.64)
  - Machine Learning > Reinforcement Learning (0.62)