Training Optimal Large Diffusion Language Models

Ni, Jinjie, Liu, Qian, Du, Chao, Dou, Longxu, Yan, Hang, Wang, Zili, Pang, Tianyu, Shieh, Michael Qizhe

Nov-6-2025–arXiv.org Artificial Intelligence

We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-6-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China > Shanghai
    - Shanghai (0.04)
  - Singapore (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Large Language Model (0.69)
  - Representation & Reasoning (1.00)