Accelerating Blockwise Parallel Language Models with Draft Refinement

Mar-19-2026, 16:30:34 GMT–Neural Information Processing Systems

Autoregressive language models have achieved remarkable advancements, yet their potential is often limited by the slow inference speeds associated with sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. [42] as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verified by the autoregressive model.

artificial intelligence, natural language, proceedings, (5 more...)

Neural Information Processing Systems

Mar-19-2026, 16:30:34 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.74)