A Survey on Diffusion Language Models

Li, Tianyi, Chen, Mingda, Guo, Bowei, Shen, Zhiqiang

Dec-8-2025–arXiv.org Artificial Intelligence

A different approach, Reparameter-ized Discrete diffusion Models (RDMs) [62], establishes an alternative formulation for the reverse process, which simplifies the training objective to a weighted cross-entropy loss. This enables more flexible and adaptive decoding strategies, leading to significant performance gains over previous discrete diffusion models. Similarly, MD4 [63] derives a simple weighted integral of cross-entropy losses as the continuous-time variational objective of masked diffusion models, providing a simple and generalized framework for training DLMs. Another analogous approach is MDLM [64], which introduces a simplified, Rao-Blackwellized objective that takes the form of a weighted average of masked language modeling losses. Diffusion-LLM [65] demonstrates the scalability of DLMs by adapting pre-trained masked language models to diffusion paradigm and further task-specific finetuning and instruction finetuning, unlocking their versatility in solving general language tasks. Diffusion-NAT [66] unifies a discrete diffusion model with a PLM by reformulating the denoising process as a non-autoregressive masked token recovery task, allowing BART to act as an effective denoiser. Plaid [67] is the first diffusion language model trained to maximize data likelihood, demonstrating through scaling laws that it can outperform autoregressive models like GPT-2 on standard benchmarks. T o improve the training objective, SEDD [68] introduces a score entropy loss to directly learn the ratios of the data distribution, which serves as a discrete extension of score matching. Reparameterized Absorbing Discrete Diffusion (RADD) [69] reveals that the concrete score in absorbing diffusion can be expressed as a time-independent conditional probability of the clean data, multiplied by an analytic, time-dependent scalar.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Dec-8-2025

arXiv.org PDF

Add feedback

Genre:
- Overview (1.00)
- Research Report > New Finding (0.45)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found