MMaDA: Multimodal Large Diffusion Language Models

Jun-14-2026, 04:17:48 GMT–Neural Information Processing Systems

We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. The approach is distinguished by three key innovations: (i) MMaDA adopts a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components. This architecture ensures seamless integration and processing across different data types.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Jun-14-2026, 04:17:48 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.64)
  - Machine Learning > Neural Networks (0.41)