Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

Nimmaturi, Datta, Bhargava, Vaishnavi, Ghosh, Rajat, George, Johnu, Dutta, Debojyoti

Dec-2-2025–arXiv.org Artificial Intelligence

Fine-tuning large language models (LLMs) for complex reasoning with reinforcement learning (RL) continues to be prohibitively expensive. Through a phenomenological investigation of GRPO post-training dynamics, we identify a scaling law characterized by exponential reward saturation. The emergence of this early plateau motivates an important question: can GRPO be equipped with principled early stopping criteria to significantly reduce post-training compute while preserving downstream performance? Across four open-source models--Llama 3B/8B and Qwen 3B/7B--we perform a systematic empirical study of GRPO fine-tuning and derive scaling laws that accurately predict reward trajectories during training. Our analysis shows that GRPO reward curves are well-approximated by an exponential saturation with three phases that are consistent across all models: (i) slow initial progress, (ii) rapid improvement, and (iii) saturation. We further show that a simple parametric scaling law, conditioned on model size, initial performance, and normalized training progress, reliably predicts the onset of plateauing performance. A key practical finding is that training beyond roughly 80% of a single epoch yields negligible reward gains while consuming a substantial fraction of total computation. Using our scaling law, practitioners can forecast these phase transitions early and select data-driven stopping points, substantially reducing GRPO compute without sacrificing final performance. Our results suggest that such predictive scaling laws are a promising tool for managing GRPO finetuning costs.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Dec-2-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found