Enhancing Memory and Imagination Consistency in Diffusion-based World Models via Linear-Time Sequence Modeling

Lee, Jia-Hua, Lin, Bor-Jiun, Sun, Wei-Fang, Lee, Chun-Yi

arXiv.org Artificial Intelligence 

World models are crucial for enabling agents to simulate and plan within environments, yet existing approaches struggle with long-term dependencies and inconsistent predictions. We introduce EDELINE, a novel framework that integrates diffusion models with linear-time state space modelsto enhance memory retention and temporal consistency. EDELINE employs a recurrent embedding module based on Mamba SSMs for processing unbounded sequences, a unified architecture for joint reward and termination prediction, and dynamic loss harmonization to balance multi-task learning. Our results across multiple benchmarks demonstrate EDELINE's superiority and robustness over prior baselines in long-horizon tasks.