Scaling Speculative Decoding with LOOKAHEADREASONING

Neural Information Processing Systems 

Reasoning models excel by generating long chain-of-thoughts, but decoding the resulting thousands of tokens is slow. Token-level speculative decoding (SD) helps, but its benefit is capped, because the chance that an entire γ-token guess is correct falls exponentially as γ grows.