Accelerating Blockwise Parallel Language Models with Draft Refinement
–Neural Information Processing Systems
Autoregressive language models have achieved remarkable advancements, yet their potential is often limited by the slow inference speeds associated with sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. [42] as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verified by the autoregressive model.
Neural Information Processing Systems
Mar-19-2026, 16:30:34 GMT
- Technology: