AITopics | diffusion-lm

Noise Schedule

Neural Information Processing SystemsApr-24-2026, 22:45:08 GMT

Because a diffusion model shares parameters for all diffusion steps, the noise schedule (parametrized by 1:T) is an important hyperparameter that determines how much weight we assign to each denoising problem. We find that standard noise schedules for continuous diffusions are not robust for text data. We hypothesize that the discrete nature of text and the rounding step make the model insensitive to noise near t =0 . Concretely, adding small amount of Gaussian noise to a word embedding is unlikely to change its nearest neighbor in the embedding space, making denoising an easy task near t =0 . To address this, we introduce a new sqrt noise schedule that is better suited for text, shown in Figure 5 defined by t =1 p t/T +s, where s is a small constant that corresponds to the starting noise level11. Compared to standard linear and cosine schedules, our sqrt schedule starts with a higher noise level and increase noise rapidly for the first 50 steps. Then sqrt slows down injecting noise to avoid spending much steps in the high-noise problems, which may be too difficult to solve well. The hyperparameters that are specific to Diffusion-LM include the number of diffusion steps, the architecture of the Diffusion-LM, the embedding dimension, and the noise schedule, . We set the diffusion steps to be 2000, the architecture to be BERT-base [7], and the sequence length to be 64. For the embedding dimensions, we select from d 2{ 16,64,128,256} and select d = 16for the E2E dataset and d = 128for ROCStories. For the noise schedule, we design the sqrt schedule (Appendix A) that is more robust to different parametrizations and embedding dimensions as shown in Appendix M. However, once we picked the x0-parametrization ( 4.2) the advantage of sqrt schedule is not salient. We train Diffusion-LMs using AdamW optimizer and a linearly decay learning rate starting at 1e-4, dropout of 0.1, batch size of 64, and the total number of training iteration is 200K for E2E dataset, and 800K for ROCStories dataset. Our Diffusion-LMs are trained on a single GPU: NVIDIARTXA5000, NVIDIAGeForce RTX 3090, or NVIDIAA100.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Industry:

Consumer Products & Services > Restaurants (1.00)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

1be5bc25d50895ee656b8c2d9eb89d6a-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 22:45:05 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.28)

Industry:

Leisure & Entertainment (0.93)
Consumer Products & Services > Restaurants (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

1be5bc25d50895ee656b8c2d9eb89d6a-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 18:33:48 GMT

coffee shop, customer rating, diffusion-lm, (16 more...)

Neural Information Processing Systems

Country:

South America > Brazil (0.04)
North America > United States > California (0.04)

Industry:

Consumer Products & Services > Restaurants (1.00)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

1be5bc25d50895ee656b8c2d9eb89d6a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 18:33:45 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > China > Hong Kong (0.04)
(7 more...)

Industry:

Leisure & Entertainment (0.93)
Consumer Products & Services > Restaurants (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards Latent Diffusion Suitable For Text

Midavaine, Nesta, Naesseth, Christian A., Bartosh, Grigory

arXiv.org Machine LearningJan-26-2026

Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of continuous diffusion models to discrete state spaces. NFDM learns a multivariate forward process from the data, ensuring that the forward process and generative trajectory are a good fit for language modeling. Our model substantially reduces the likelihood gap with autoregressive models of the same size, while achieving sample quality comparable to that of previous latent diffusion models.

forward process, large language model, machine learning, (20 more...)

arXiv.org Machine Learning

2601.1622

Country: Asia (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)

Add feedback

Diffusion-LM Improves Controllable Text Generation

Neural Information Processing SystemsDec-23-2025, 20:57:00 GMT

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM.

controllable text generation, electronic proceedings, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (0.61)

Add feedback

Diffusion-LM Improves Controllable Text Generation

Neural Information Processing SystemsMay-26-2025, 18:47:54 GMT

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. The continuous, hierarchical nature of these intermediate variables enables a simple gradient-based algorithm to perform complex, controllable generation tasks. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.

artificial intelligence, controllable text generation, natural language, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)

Add feedback

Diffusion-LM Improves Controllable Text Generation

Neural Information Processing SystemsOct-10-2024, 03:19:11 GMT

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. The continuous, hierarchical nature of these intermediate variables enables a simple gradient-based algorithm to perform complex, controllable generation tasks. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.

controllable text generation, diffusion-lm, language model, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)

Add feedback

Generative Design of inorganic compounds using deep diffusion language models

Dong, Rongzhi, Fu, Nihang, Siriwardane, dirisuriya M. D., Hu, Jianjun

arXiv.org Artificial IntelligenceSep-30-2023

Discovering novel synthesizable and stable materials is of fundamental importance to our society. However, chemical innovation is nontrivial. The material composition and structure must satisfy many stringent constraints such as charge neutrality, balanced electronegativity, synthesizability, geometric symmetry, and mechanical stability. Historically, new material discovery relies on expert heuristics and usually is based on the tinkering of existing materials. Several structure generation studies [1, 2] have used brute-force element substitution to generate new structures based on known prototypes. However, the limitation of this permutation-based approach is that it cannot generate new formula prototypes, it can only employ known formulas as templates, facilitating the generation of novel compositions solely through the substitution of elements. With the development of crystal structure prediction algorithms such as CSMPL [3], TCSP [4], and ParetoCSP [5], the generation of chemically stable compositions has emerged as an increasingly critical challenge. Stable compositions play a pivotal role in mitigating the computational demands associated with subsequent stages of analysis.

composition, formula, language model, (15 more...)

arXiv.org Artificial Intelligence

2310.00475

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
Europe > Austria > Vienna (0.04)
Asia > Sri Lanka (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control

Han, Xiaochuang, Kumar, Sachin, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceJun-26-2023

Despite the growing success of diffusion models in continuous-valued domains (e.g., images), similar efforts for discrete domains such as text have yet to match the performance of autoregressive language models. In this work, we present SSD-LM -- a diffusion-based language model with two key design choices. First, SSD-LM is semi-autoregressive, iteratively generating blocks of text, allowing for flexible output length at decoding time while enabling local bidirectional context updates. Second, it is simplex-based, performing diffusion on the natural vocabulary space rather than a learned latent space, allowing us to incorporate classifier guidance and modular control using off-the-shelf classifiers without any adaptation. We evaluate SSD-LM on unconstrained text generation benchmarks, and show that it matches or outperforms strong autoregressive GPT-2 models across standard quality and diversity metrics, while vastly outperforming diffusion-based baselines. On controlled text generation, SSD-LM also outperforms competitive baselines, with an extra advantage in modularity.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2210.17432

Country: