$\text{M}^{\text{3}}$: A Modular World Model over Streams of Tokens