LMFusion: Adapting Pretrained Language Models for Multimodal Generation
–Neural Information Processing Systems
We present LMFusion, a framework for empowering pretrained text-only large language models (LLMs) with multimodal generative capabilities, enabling them to understand and generate both text and images in arbitrary sequences.
Neural Information Processing Systems
Jun-14-2026, 17:02:47 GMT