Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models

Costello, Caia, Guo, Simon, Goldie, Anna, Mirhoseini, Azalia

Apr-28-2025–arXiv.org Artificial Intelligence

A BSTRACT Large language models (LLMs) have demonstrated strong capabilities in programming and mathematical reasoning tasks, but are constrained by limited high-quality training data. Synthetic data can be leveraged to enhance fine-tuning outcomes, but several factors influence this process, including model size, synthetic data volume, pruning strategy, and number of fine-tuning rounds. We explore these axes and investigate which conditions enable model self-improvement. We introduce the Think, Prune, Train process, a scalable framework that iteratively fine-tunes models on their own reasoning traces, using ground-truth pruning to ensure high-quality training data. This approach yields improved performance: on GSM8K, Gemma2-2B achieves a Pass@1 of 57.6% (from 41.9%), Gemma2-9B reaches 82%, matching LLaMA-3.1-70B, and LLaMA-3.1-70B One promising approach is leveraging curated synthetic data to improve reasoning, an essential part of advancing code generation and mathematical problem-solving. Recent frontier models like LLaMA 3.1 Dubey et al. (2024) and DeepSeek R1 DeepSeek AI Team (2024) demonstrate that post-training on reasoning traces coupled with supervised fine-tuning (SFT) on filtered (pruned) data works well to improve models. Their strong performance on coding and math benchmarks highlights how properly curated synthetic data can drive substantial performance gains. For smaller models such as LLaMA (1B, 3B) and Gemma (2B) (9B) Team et al. (2024b), distillation Hinton et al. (2015) coupled with fine-tuning on reasoning trace datasets has become the dominant post-training paradigm.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Apr-28-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Asia (0.28)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found