Goto

Collaborating Authors

 diffusion-lm


Noise Schedule

Neural Information Processing Systems

Because a diffusion model shares parameters for all diffusion steps, the noise schedule (parametrized by 1:T) is an important hyperparameter that determines how much weight we assign to each denoising problem. We find that standard noise schedules for continuous diffusions are not robust for text data. We hypothesize that the discrete nature of text and the rounding step make the model insensitive to noise near t =0 . Concretely, adding small amount of Gaussian noise to a word embedding is unlikely to change its nearest neighbor in the embedding space, making denoising an easy task near t =0 . To address this, we introduce a new sqrt noise schedule that is better suited for text, shown in Figure 5 defined by t =1 p t/T +s, where s is a small constant that corresponds to the starting noise level11. Compared to standard linear and cosine schedules, our sqrt schedule starts with a higher noise level and increase noise rapidly for the first 50 steps. Then sqrt slows down injecting noise to avoid spending much steps in the high-noise problems, which may be too difficult to solve well. The hyperparameters that are specific to Diffusion-LM include the number of diffusion steps, the architecture of the Diffusion-LM, the embedding dimension, and the noise schedule, . We set the diffusion steps to be 2000, the architecture to be BERT-base [7], and the sequence length to be 64. For the embedding dimensions, we select from d 2{ 16,64,128,256} and select d = 16for the E2E dataset and d = 128for ROCStories. For the noise schedule, we design the sqrt schedule (Appendix A) that is more robust to different parametrizations and embedding dimensions as shown in Appendix M. However, once we picked the x0-parametrization ( 4.2) the advantage of sqrt schedule is not salient. We train Diffusion-LMs using AdamW optimizer and a linearly decay learning rate starting at 1e-4, dropout of 0.1, batch size of 64, and the total number of training iteration is 200K for E2E dataset, and 800K for ROCStories dataset. Our Diffusion-LMs are trained on a single GPU: NVIDIARTXA5000, NVIDIAGeForce RTX 3090, or NVIDIAA100.





Towards Latent Diffusion Suitable For Text

arXiv.org Machine Learning

Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of continuous diffusion models to discrete state spaces. NFDM learns a multivariate forward process from the data, ensuring that the forward process and generative trajectory are a good fit for language modeling. Our model substantially reduces the likelihood gap with autoregressive models of the same size, while achieving sample quality comparable to that of previous latent diffusion models.


Diffusion-LM Improves Controllable Text Generation

Neural Information Processing Systems

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM.


Diffusion-LM Improves Controllable Text Generation

Neural Information Processing Systems

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. The continuous, hierarchical nature of these intermediate variables enables a simple gradient-based algorithm to perform complex, controllable generation tasks. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.


Diffusion-LM Improves Controllable Text Generation

Neural Information Processing Systems

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. The continuous, hierarchical nature of these intermediate variables enables a simple gradient-based algorithm to perform complex, controllable generation tasks. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.


Generative Design of inorganic compounds using deep diffusion language models

arXiv.org Artificial Intelligence

Discovering novel synthesizable and stable materials is of fundamental importance to our society. However, chemical innovation is nontrivial. The material composition and structure must satisfy many stringent constraints such as charge neutrality, balanced electronegativity, synthesizability, geometric symmetry, and mechanical stability. Historically, new material discovery relies on expert heuristics and usually is based on the tinkering of existing materials. Several structure generation studies [1, 2] have used brute-force element substitution to generate new structures based on known prototypes. However, the limitation of this permutation-based approach is that it cannot generate new formula prototypes, it can only employ known formulas as templates, facilitating the generation of novel compositions solely through the substitution of elements. With the development of crystal structure prediction algorithms such as CSMPL [3], TCSP [4], and ParetoCSP [5], the generation of chemically stable compositions has emerged as an increasingly critical challenge. Stable compositions play a pivotal role in mitigating the computational demands associated with subsequent stages of analysis.


SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control

arXiv.org Artificial Intelligence

Despite the growing success of diffusion models in continuous-valued domains (e.g., images), similar efforts for discrete domains such as text have yet to match the performance of autoregressive language models. In this work, we present SSD-LM -- a diffusion-based language model with two key design choices. First, SSD-LM is semi-autoregressive, iteratively generating blocks of text, allowing for flexible output length at decoding time while enabling local bidirectional context updates. Second, it is simplex-based, performing diffusion on the natural vocabulary space rather than a learned latent space, allowing us to incorporate classifier guidance and modular control using off-the-shelf classifiers without any adaptation. We evaluate SSD-LM on unconstrained text generation benchmarks, and show that it matches or outperforms strong autoregressive GPT-2 models across standard quality and diversity metrics, while vastly outperforming diffusion-based baselines. On controlled text generation, SSD-LM also outperforms competitive baselines, with an extra advantage in modularity.