Unlocking the Power of Representations in Long-term Novelty-based Exploration
Saade, Alaa, Kapturowski, Steven, Calandriello, Daniele, Blundell, Charles, Sprechmann, Pablo, Sarra, Leopoldo, Groth, Oliver, Valko, Michal, Piot, Bilal
–arXiv.org Artificial Intelligence
We introduce Robust Exploration via Clusteringbased Online Density Estimation (RECODE), a nonparametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in Figure 1: A key result of RECODE is that it allows us to a suite of challenging 3D-exploration tasks in leverage more powerful state representations for long-term DM-HARD-8. RECODE also sets new state-of-theart novelty estimation; enabling to achieve a new state-of-theart in hard exploration Atari games, and is the first in the challenging 3D task suite DM-HARD-8.
arXiv.org Artificial Intelligence
May-2-2023
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.54)
- Technology: