Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Costello, Caia, Guo, Simon, Goldie, Anna, Mirhoseini, Azalia
–arXiv.org Artificial Intelligence
A BSTRACT Large language models (LLMs) have demonstrated strong capabilities in programming and mathematical reasoning tasks, but are constrained by limited high-quality training data. Synthetic data can be leveraged to enhance fine-tuning outcomes, but several factors influence this process, including model size, synthetic data volume, pruning strategy, and number of fine-tuning rounds. We explore these axes and investigate which conditions enable model self-improvement. We introduce the Think, Prune, Train process, a scalable framework that iteratively fine-tunes models on their own reasoning traces, using ground-truth pruning to ensure high-quality training data. This approach yields improved performance: on GSM8K, Gemma2-2B achieves a Pass@1 of 57.6% (from 41.9%), Gemma2-9B reaches 82%, matching LLaMA-3.1-70B, and LLaMA-3.1-70B One promising approach is leveraging curated synthetic data to improve reasoning, an essential part of advancing code generation and mathematical problem-solving. Recent frontier models like LLaMA 3.1 Dubey et al. (2024) and DeepSeek R1 DeepSeek AI Team (2024) demonstrate that post-training on reasoning traces coupled with supervised fine-tuning (SFT) on filtered (pruned) data works well to improve models. Their strong performance on coding and math benchmarks highlights how properly curated synthetic data can drive substantial performance gains. For smaller models such as LLaMA (1B, 3B) and Gemma (2B) (9B) Team et al. (2024b), distillation Hinton et al. (2015) coupled with fine-tuning on reasoning trace datasets has become the dominant post-training paradigm.
arXiv.org Artificial Intelligence
Apr-28-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- North America > United States
- California > Santa Clara County
- Palo Alto (0.04)
- Virginia (0.04)
- California > Santa Clara County
- Asia > Middle East
- Genre:
- Research Report > New Finding (1.00)
- Technology: