Scaling Speculative Decoding with LOOKAHEADREASONING
–Neural Information Processing Systems
Reasoning models excel by generating long chain-of-thoughts, but decoding the resulting thousands of tokens is slow. Token-level speculative decoding (SD) helps, but its benefit is capped, because the chance that an entire γ-token guess is correct falls exponentially as γ grows.
Neural Information Processing Systems
Jun-23-2026, 03:59:14 GMT