Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models
Wang, Ziyan, Diao, Enmao, Le, Qi, Wang, Pu, Wang, Guanchu, Lee, Minwoo, Yeh, Shu-ping, Yang, Li
–arXiv.org Artificial Intelligence
Reasoning LLMs (RLMs) such as OpenAI o1, DeepSeek-R1, and Qwen3 deliver strong multi-step reasoning through chain-of-thought generation, but their large model sizes and lengthy decode-time outputs make them costly to deploy and unsuitable for resource-constrained settings. To reduce computing and memory cost, pruning offers a promising solution by removing unimportant parameters. However, despite their success on standard LLMs, existing pruning methods severely damage RLMs, as even moderate sparsity (e.g., 20%) can collapse accuracy and completely disrupt the model's reasoning coherence. We begin by analyzing why existing pruning pipelines fail on reasoning LLMs and find that their brittleness largely stems from a mismatch between the calibration data, the pruning objective, and the model's decode-time reasoning behavior. Our study further shows that the most reliable calibration signal comes not from human-written labels but from the model's own self-generated reasoning traces, which more accurately reflect its inference distribution. Guided by these insights, we introduce RESP, a self-reflective structured pruning framework that aligns pruning decisions with the model's reasoning dynamics through self-generated calibration, decode-only gradient-based importance estimation, and progressive regeneration that maintains calibration fidelity as sparsity increases. Experiments on Qwen3-8B demonstrate that RESP markedly outperforms existing structured pruning methods on both GSM8K and MathQA, preserving near-dense accuracy at 20-30% sparsity and substantially mitigating performance collapse at higher sparsity levels. At 40% sparsity, RESP attains 81.3% accuracy on GSM8K and 59.6% on MathQA, surpassing the strongest baselines by 66.87% and 47%, respectively.
arXiv.org Artificial Intelligence
Dec-3-2025
- Country:
- Asia > China
- Sichuan Province > Chengdu (0.04)
- North America > United States
- California > Santa Clara County
- Santa Clara (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.28)
- North Carolina > Mecklenburg County
- Charlotte (0.05)
- California > Santa Clara County
- Asia > China
- Genre:
- Research Report (1.00)
- Technology: