Training Optimal Large Diffusion Language Models

Ni, Jinjie, Liu, Qian, Du, Chao, Dou, Longxu, Yan, Hang, Wang, Zili, Pang, Tianyu, Shieh, Michael Qizhe

arXiv.org Artificial Intelligence 

We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.