Speculative Decoding and Beyond: An In-Depth Survey of Techniques

Hu, Yunhai, Liu, Zining, Dong, Zhenyuan, Peng, Tianfan, McDanel, Bradley, Zhang, Sai Qian

Mar-3-2025–arXiv.org Artificial Intelligence

--Sequential dependencies present a fundamental bottleneck in deploying large-scale autoregressive models, particularly for real-time applications. While traditional optimization approaches like pruning and quantization often compromise model quality, recent advances in generation-refinement frameworks demonstrate that this trade-off can be significantly mitigated. This survey presents a comprehensive taxonomy of generation-refinement frameworks, analyzing methods across autoregressive sequence tasks. We categorize methods based on their generation strategies (from simple n-gram prediction to sophisticated draft models) and refinement mechanisms (including single-pass verification and iterative approaches). Through systematic analysis of both algorithmic innovations and system-level implementations, we examine deployment strategies across computing environments and explore applications spanning text, images, and speech generation. This systematic examination of both theoretical frameworks and practical implementations provides a foundation for future research in efficient autoregressive decoding. Index T erms --Large Language Model, Speculative Decoding, Computer System, Distributed System.

arxiv preprint arxiv, inference, language model, (15 more...)

arXiv.org Artificial Intelligence

Mar-3-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania (0.04)
  - Florida > Miami-Dade County
    - Miami (0.04)
- Asia
  - Thailand > Bangkok
    - Bangkok (0.04)
  - China > Guangdong Province
    - Shenzhen (0.04)

Genre:
- Research Report (1.00)
- Overview (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Vision (0.94)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)