A Survey on Diffusion Language Models

Li, Tianyi, Chen, Mingda, Guo, Bowei, Shen, Zhiqiang

arXiv.org Artificial Intelligence 

A different approach, Reparameter-ized Discrete diffusion Models (RDMs) [62], establishes an alternative formulation for the reverse process, which simplifies the training objective to a weighted cross-entropy loss. This enables more flexible and adaptive decoding strategies, leading to significant performance gains over previous discrete diffusion models. Similarly, MD4 [63] derives a simple weighted integral of cross-entropy losses as the continuous-time variational objective of masked diffusion models, providing a simple and generalized framework for training DLMs. Another analogous approach is MDLM [64], which introduces a simplified, Rao-Blackwellized objective that takes the form of a weighted average of masked language modeling losses. Diffusion-LLM [65] demonstrates the scalability of DLMs by adapting pre-trained masked language models to diffusion paradigm and further task-specific finetuning and instruction finetuning, unlocking their versatility in solving general language tasks. Diffusion-NAT [66] unifies a discrete diffusion model with a PLM by reformulating the denoising process as a non-autoregressive masked token recovery task, allowing BART to act as an effective denoiser. Plaid [67] is the first diffusion language model trained to maximize data likelihood, demonstrating through scaling laws that it can outperform autoregressive models like GPT-2 on standard benchmarks. T o improve the training objective, SEDD [68] introduces a score entropy loss to directly learn the ratios of the data distribution, which serves as a discrete extension of score matching. Reparameterized Absorbing Discrete Diffusion (RADD) [69] reveals that the concrete score in absorbing diffusion can be expressed as a time-independent conditional probability of the clean data, multiplied by an analytic, time-dependent scalar.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found