Self-conditioned Embedding Diffusion for Text Generation
Strudel, Robin, Tallec, Corentin, Altché, Florent, Du, Yilun, Ganin, Yaroslav, Mensch, Arthur, Grathwohl, Will, Savinov, Nikolay, Dieleman, Sander, Sifre, Laurent, Leblond, Rémi
–arXiv.org Artificial Intelligence
Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models -- while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion. Continuous diffusion models (Sohl-Dickstein et al., 2015) have taken the world of image generation by storm, advancing the state of the art further than ever before (Rombach et al., 2021; Ramesh et al., 2022). Diffusion for language is indeed an attractive prospect. Compared to autoregressive (AR) models (Bengio et al., 2000; Sutskever et al., 2011; Austin et al., 2021; Hoffmann et al., 2022), diffusion models can predict all tokens in a sequence at once. This allows for bidirectional, rather than causal attention-- increasing interactions between tokens, potentially leading to more coherent samples. Diffusion models can make a better usage of hardware accelerators during inference than AR models, since computations are parallelizable over the sequence axis.
arXiv.org Artificial Intelligence
Nov-8-2022
- Country:
- Pacific Ocean (0.04)
- North America
- United States > Massachusetts (0.04)
- Puerto Rico (0.04)
- Europe
- Asia > Middle East
- Jordan (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment (0.46)
- Technology: