Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens

Open in new window