From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
Li, Yafu, Wang, Zhilin, Fu, Tingchen, Cui, Ganqu, Yang, Sen, Cheng, Yu
–arXiv.org Artificial Intelligence
Scaling data and model size has been proven effective for boosting the performance of large language models. In addition to training-time scaling, recent studies have revealed that increasing test-time computational resources can further improve performance. In this work, we introduce Aggregation Fine-Tuning (AFT), a supervised finetuning paradigm where the model learns to synthesize multiple draft responses, referred to as proposals, into a single, refined answer, termed aggregation. At inference time, a propose-and-aggregate strategy further boosts performance by iteratively generating proposals and aggregating them. Empirical evaluations on benchmark datasets show that AFT-trained models substantially outperform standard SFT. Notably, an AFT model, fine-tuned from Llama3.1-8B-Base with only 64k data, achieves a 41.3% LC win rate on AlpacaEval 2, surpassing significantly larger LLMs such as Llama3.1-405B-Instruct and GPT4. By combining sequential refinement and parallel sampling, the propose-and-aggregate framework scales inference-time computation in a flexible manner. Overall, These findings position AFT as a promising approach to unlocking additional capabilities of LLMs without resorting to increasing data volume or model size.
arXiv.org Artificial Intelligence
Jan-20-2025
- Country:
- Asia > China (0.28)
- Europe > Austria
- Vienna (0.14)
- North America
- Mexico > Mexico City (0.14)
- United States (0.28)
- Genre:
- Research Report (1.00)
- Technology: