The Impact of Quantization on Large Reasoning Model Reinforcement Learning
Kumar, Medha, Xu, Zifei, Wang, Xin, Webb, Tristan
–arXiv.org Artificial Intelligence
Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization impacts RL in large reasoning models (LRMs) remains an open question. To answer this question, we conducted systematic experiments and discovered a significant gap in reasoning performance on mathematical benchmarks between post-RL quantized models and their quantization-aware RL optimized counterparts. Our findings suggest that quantization-aware RL training negatively impacted the learning process, whereas PTQ and QLoRA led to greater performance.
arXiv.org Artificial Intelligence
Nov-20-2025
- Country:
- North America > United States
- California > Santa Clara County
- Santa Clara (0.05)
- Pennsylvania (0.04)
- California > Santa Clara County
- North America > United States
- Genre:
- Research Report > New Finding (1.00)
- Technology: