Behavior From the Void: Unsupervised Active Pre-Training

Jan-17-2025, 23:02:13 GMT–Neural Information Processing Systems

We introduce a new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training. APT learns behaviors and representations by actively searching for novel states in reward-free environments. The key novel idea is to explore the environment by maximizing a non-parametric entropy computed in an abstract representation space, which avoids challenging density modeling and consequently allows our approach to scale much better in environments that have high-dimensional observations (e.g., image observations). We empirically evaluate APT by exposing task-specific reward after a long unsupervised pre-training phase. In Atari games, APT achieves human-level performance on 12 games and obtains highly competitive performance compared to canonical fully supervised RL algorithms.

apt, unsupervised active pre-training, void

Neural Information Processing Systems

Jan-17-2025, 23:02:13 GMT

Conferences Web Page

Add feedback

Industry:
- Leisure & Entertainment > Games > Computer Games (0.66)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)