Learning General World Models in a Handful of Reward-Free Deployments Yingchen Xu UCL, FAIR Jack Parker-Holder University of Oxford Aldo Pacchiano Microsoft Research Philip J. Ball

Neural Information Processing Systems 

Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found