Training Optimal Large Diffusion Language Models
Ni, Jinjie, Liu, Qian, Du, Chao, Dou, Longxu, Yan, Hang, Wang, Zili, Pang, Tianyu, Shieh, Michael Qizhe
–arXiv.org Artificial Intelligence
We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.
arXiv.org Artificial Intelligence
Nov-6-2025