SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification

Yoon, Kanghoon, Kim, Minsub, Lee, Sungjae, Lee, Joonhyung, Woo, Sunghyeon, In, Yeonjun, Kwon, Se Jung, Park, Chanyoung, Lee, Dongsoo

Oct-6-2025–arXiv.org Artificial Intelligence

Empirical scaling laws establish a relationship between the number of parameters and model capability, as evidenced by models with hundreds of billions of parameters achieving state-of-the-art results on benchmarks (Kaplan et al., 2020; Grattafiori et al., 2024). However, the autoregressive generation process requires accessing all model parameters for each forward pass, creating a memory bandwidth bottleneck that impacts token generation latency. Furthermore, current trends toward more sophisticated LLM applications, such as multi-hop reasoning (Wei et al., 2022), tool integration (Patil et al., 2024), and reasoning capability (Y ang et al., 2025; DeepMind, 2025), produce longer output sequences, amplifying the computational burden of inference. One prominent approach to address inference latency is Speculative Decoding (SD), which achieves partial parallelization of the generation process (Leviathan et al., 2023; Chen et al., 2023). Standard SD operates by deploying a computationally efficient draft model to propose candidate token sequences, which are subsequently validated in parallel by the target model (the model of interest). The acceptance criterion for draft tokens relies on a probability-based alignment verification: draft tokens are accepted when their likelihood under the target model meets or exceeds their likelihood under the draft model.

large language model, machine learning, target model, (19 more...)

arXiv.org Artificial Intelligence

Oct-6-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Mexico (0.28)
  - United States > Minnesota (0.28)

Genre:
- Research Report (0.83)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found