ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

Chen, Qiaoling, Liu, Zijun, Sun, Peng, Li, Shenggui, Wang, Guoteng, Liu, Ziming, Wen, Yonggang, Feng, Siyuan, Zhang, Tianwei

Oct-31-2025–arXiv.org Artificial Intelligence

We identify three critical gaps that hinder the na ıve integration of SD into RL systems: diminishing speedups at large batch sizes, drafter staleness under continual actor updates, and drafter-induced policy degradation. Among these stages, generation is consistently the dominant bottleneck (Zhong et al., 2025a). Classic policy optimization methods such as PPO (Schul-man et al., 2017; 2015) combine trajectory-level rewards A natural optimization to address this bottleneck is speculative decoding (SD) (Leviathan et al., 2023; Chen et al., SD has already been widely adopted in LLM serving systems (e.g., SGLang (Zheng et al., 2024), Among various SD variants, EAGLE-3 (Li et al., 2025) represents the current state of the art, achieving the Overview of SD in RL training and our proposed ReSpec system. Consequently, a single static SD configuration cannot provide reliable speedups across diverse RL workloads, as shown in Figure 3. (G2) Drafter staleness under continual actor updates. As the actor (i.e., target) model evolves with each policy update, a fixed drafter rapidly becomes misaligned with the actor Moreover, this variance increases with drafter staleness (G2), causing a higher ratio of impoverished trajectories.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Oct-31-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China
    - Beijing > Beijing (0.04)
    - Shanghai > Shanghai (0.04)
  - Middle East > Jordan (0.04)
  - Singapore > Central Region
    - Singapore (0.04)
- North America > United States (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (0.93)
    - Large Language Model (1.00)
  - Representation & Reasoning (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found