A Mixture Of Surprises for Unsupervised Reinforcement Learning

Jan-18-2025, 10:15:23 GMT–Neural Information Processing Systems

Unsupervised reinforcement learning aims at learning a generalist policy in a reward-free manner for fast adaptation to downstream tasks. Most of the existing methods propose to provide an intrinsic reward based on surprise. Maximizing or minimizing surprise drives the agent to either explore or gain control over its environment. However, both strategies rely on a strong assumption: the entropy of the environment's dynamics is either high or low. This assumption may not always hold in real-world scenarios, where the entropy of the environment's dynamics may be unknown.

objective, textbf, unsupervised reinforcement learning, (3 more...)

Neural Information Processing Systems

Jan-18-2025, 10:15:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)