Scaling Speculative Decoding with LOOKAHEADREASONING

Jun-23-2026, 03:59:14 GMT–Neural Information Processing Systems

Reasoning models excel by generating long chain-of-thoughts, but decoding the resulting thousands of tokens is slow. Token-level speculative decoding (SD) helps, but its benefit is capped, because the chance that an entire γ-token guess is correct falls exponentially as γ grows.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-23-2026, 03:59:14 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks (0.68)
  - Natural Language
    - Large Language Model (0.71)
    - Chatbot (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found