MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
–Neural Information Processing Systems
The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult.
Neural Information Processing Systems
Apr-29-2026, 13:58:00 GMT