Multimodal Latent Language Modeling with Next-Token Diffusion