Learning Transformer-based World Models with Contrastive Predictive Coding
–arXiv.org Artificial Intelligence
The DreamerV3 algorithm recently obtained remarkable performance across diverse environment domains by learning an accurate world model based on Recurrent Neural Networks (RNNs). Following the success of model-based reinforcement learning algorithms and the rapid adoption of the Transformer architecture for its superior training efficiency and favorable scaling properties, recent works such as STORM have proposed replacing RNN-based world models with Transformer-based world models using masked self-attention. However, despite the improved training efficiency of these methods, their impact on performance remains limited compared to the Dreamer algorithm, struggling to learn competitive Transformer-based world models. In this work, we show that the next state prediction objective adopted in previous approaches is insufficient to fully exploit the representation capabilities of Transformers. We propose to extend world model predictions to longer time horizons by introducing TWISTER (Transformer-based World model wIth contraSTivE Representations), a world model using actionconditioned Contrastive Predictive Coding to learn high-level temporal feature representations and improve the agent performance. TWISTER achieves a humannormalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search. We release our code at https://github.com/burchim/TWISTER. TWISTER outperforms Following the success of neural networks in solving reinforcement other model-based approaches. TWM, learning problems, model-based approaches IRIS, STORM and -IRIS employ a learning world models using gradient backpropagation Transformer-based world model while were proposed to reduce the amount of necessary interaction DreamerV3 uses a RNN-based model. World models (Sutton, 1991; Ha & Schmidhuber, 2018) summarize an agent's experience into a predictive model that can be used in place of the real environment to learn complex behaviors. Having access to a model of the environment enables the agent to simulate multiple plausible trajectories in parallel, improving generalization, sample efficiency and decision-making via planning.
arXiv.org Artificial Intelligence
Mar-6-2025
- Country:
- Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.48)
- Technology: