Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass Sarah Pratt

Mar-27-2025, 10:22:45 GMT–Neural Information Processing Systems

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing k drafts to the user requires running an expensive language model k times. To alleviate the computation cost of running k inference passes, we propose Superposed Decoding, a new decoding algorithm that generates k drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the k drafts as input to the next decoding step of the language model.

large language model, machine learning, superposed decoding, (19 more...)

Neural Information Processing Systems

Mar-27-2025, 10:22:45 GMT

Conferences PDF

Add feedback

Country:
- Asia > India
  - Maharashtra (0.14)
- North America
  - Mexico > Veracruz (0.14)
  - United States > California (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Education (0.69)
- Health & Medicine
  - Consumer Health (0.46)
  - Therapeutic Area (0.69)
- Information Technology > Services (0.46)
- Law (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found