Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

Li, Ziheng, Sun, Zexu, Zhao, Jinman, Min, Erxue, Zeng, Yongcheng, Wu, Hui, Cai, Hengyi, Wang, Shuaiqiang, Yin, Dawei, Chen, Xu, Deng, Zhi-Hong

Sep-9-2025–arXiv.org Artificial Intelligence

Reinforcement learning with verifiable rewards (RL VR) has achieved remarkable success in enhancing the reasoning capabilities of large language models (LLMs). However, existing RL VR methods often suffer from exploration inefficiency due to mismatches between the training data's difficulty and the model's capability. LLMs fail to discover viable reasoning paths when problems are overly difficult, while learning little new capability when problems are too simple. Building on this analysis, we propose SEELE, a novel supervision-aided RL VR framework that dynamically adjusts problem difficulty to stay within the high-efficiency region. Unlike previous hint-based approaches, SEELE deliberately and adaptively adjusts the hint length for each problem to achieve an optimal difficulty. To determine the optimal hint length, SEELE employs a multi-round rollout sampling strategy. In each round, it fits an item response theory model to the accuracy-hint pairs collected in preceding rounds to predict the required hint length for the next round. Experimental results show that SEELE outperforms Group Relative Policy Optimization (GRPO) and Supervised Fine-tuning (SFT) by +11.8 and +10.5 points, respectively, and surpasses the best previous supervision-aided approach by +3.6 points on average across six math reasoning benchmarks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Sep-9-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Europe > Monaco (0.04)
- North America > Canada
  - Ontario > Toronto (0.14)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found