$\text{M}^{\text{3}}$: A Modular World Model over Streams of Tokens

Cohen, Lior, Wang, Kaixin, Kang, Bingyi, Gadot, Uri, Mannor, Shie

Feb-17-2025–arXiv.org Artificial Intelligence

Token-based world models emerged as a promising modular framework, modeling dynamics over token streams while optimizing tokenization separately. While successful in visual environments with discrete actions (e.g., Atari games), their broader applicability remains uncertain. In this paper, we introduce $\text{M}^{\text{3}}$, a $\textbf{m}$odular $\textbf{w}$orld $\textbf{m}$odel that extends this framework, enabling flexible combinations of observation and action modalities through independent modality-specific components. $\text{M}^{\text{3}}$ integrates several improvements from existing literature to enhance agent performance. Through extensive empirical evaluation across diverse benchmarks, $\text{M}^{\text{3}}$ achieves state-of-the-art sample efficiency for planning-free world models. Notably, among these methods, it is the first to reach a human-level median score on Atari 100K, with superhuman performance on 13 games. We $\href{https://github.com/leor-c/M3}{\text{open-source our code and weights}}$.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Feb-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)
- Europe > Austria (0.28)

Genre:
- Research Report (0.82)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Cognitive Science > Problem Solving (0.94)
  - Natural Language > Large Language Model (0.69)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found