SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control

Han, Xiaochuang, Kumar, Sachin, Tsvetkov, Yulia

Jun-26-2023–arXiv.org Artificial Intelligence

Despite the growing success of diffusion models in continuous-valued domains (e.g., images), similar efforts for discrete domains such as text have yet to match the performance of autoregressive language models. In this work, we present SSD-LM -- a diffusion-based language model with two key design choices. First, SSD-LM is semi-autoregressive, iteratively generating blocks of text, allowing for flexible output length at decoding time while enabling local bidirectional context updates. Second, it is simplex-based, performing diffusion on the natural vocabulary space rather than a learned latent space, allowing us to incorporate classifier guidance and modular control using off-the-shelf classifiers without any adaptation. We evaluate SSD-LM on unconstrained text generation benchmarks, and show that it matches or outperforms strong autoregressive GPT-2 models across standard quality and diversity metrics, while vastly outperforming diffusion-based baselines. On controlled text generation, SSD-LM also outperforms competitive baselines, with an extra advantage in modularity.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Jun-26-2023

arXiv.org PDF

Add feedback

Country:
- South America > Ecuador (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - New York > New York County
      - New York City (0.04)
    - California > San Diego County
      - San Diego (0.04)
- Europe
  - Ukraine (0.04)
  - Western Europe (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - Russia (0.14)
  - China (0.04)

Genre:
- Research Report (0.64)

Industry:
- Government > Military (0.67)
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Machine Translation (0.68)
    - Chatbot (0.50)
    - Large Language Model (0.50)
  - Machine Learning > Neural Networks
    - Deep Learning (0.50)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found