DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding

Li, Guanghao, Fu, Zhihui, Fang, Min, Zhao, Qibin, Tang, Ming, Yuan, Chun, Wang, Jun

Oct-6-2025–arXiv.org Artificial Intelligence

As large language models (LLMs) scale up, accuracy improves, but the autoregressive (AR) nature of decoding increases latency since each token requires a serial forward pass. Speculative decoding addresses this by employing a fast drafter to propose multi-token drafts, which are then verified in parallel by the target model. However, many deployments still rely on AR drafters, where sequential passes limit wall-clock gains. We revisit the drafting stage and present DiffuSpec, a training-free drop-in framework that uses a pretrained diffusion language model (DLM) to produce multi-token drafts in a single forward pass, while remaining compatible with standard AR verifiers. Because DLM drafts are generated under bidirectional conditioning, parallel per-position candidates form a token lattice in which the locally highest-probability token at each position need not form a causal left-to-right path. Moreover, DLM drafting requires pre-specifying a draft length, inducing a speed-quality trade-off. To address these challenges, we introduce two practical components: (i) a causal-consistency path search (CPS) over this lattice that extracts a left-to-right path aligned with AR verification; and (ii) an adaptive draft-length (ADL) controller that adjusts next proposal size based on recent acceptance feedback and realized generated length. Across benchmarks, DiffuSpec yields up to 3x wall-clock speedup, establishing diffusion-based drafting as a robust alternative to autoregressive drafters for speculative decoding.

arxiv preprint arxiv, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-6-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found