Long term Consistent
–Neural Information Processing Systems
World simulation has gained increasing popularity due to its ability to model virtual environments and predict the consequences of actions. However, the limited temporal context window often leads to failures in maintaining long-term consistency, particularly in preserving 3D spatial consistency. In this work, we present WORLDMEM, a framework that enhances scene generation with a memory bank consisting of memory units that store memory frames and states (e.g., poses and timestamps). By employing state-aware memory attention that effectively extracts relevant information from these memory frames based on their states, our method is capable of accurately reconstructing previously observed scenes, even under significant viewpoint or temporal gaps.
Neural Information Processing Systems
Jun-16-2026, 23:23:41 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Machine Learning > Neural Networks (1.00)
- Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence