From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning

Li, Yafu, Wang, Zhilin, Fu, Tingchen, Cui, Ganqu, Yang, Sen, Cheng, Yu

Jan-20-2025–arXiv.org Artificial Intelligence

Scaling data and model size has been proven effective for boosting the performance of large language models. In addition to training-time scaling, recent studies have revealed that increasing test-time computational resources can further improve performance. In this work, we introduce Aggregation Fine-Tuning (AFT), a supervised finetuning paradigm where the model learns to synthesize multiple draft responses, referred to as proposals, into a single, refined answer, termed aggregation. At inference time, a propose-and-aggregate strategy further boosts performance by iteratively generating proposals and aggregating them. Empirical evaluations on benchmark datasets show that AFT-trained models substantially outperform standard SFT. Notably, an AFT model, fine-tuned from Llama3.1-8B-Base with only 64k data, achieves a 41.3% LC win rate on AlpacaEval 2, surpassing significantly larger LLMs such as Llama3.1-405B-Instruct and GPT4. By combining sequential refinement and parallel sampling, the propose-and-aggregate framework scales inference-time computation in a flexible manner. Overall, These findings position AFT as a promising approach to unlocking additional capabilities of LLMs without resorting to increasing data volume or model size.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jan-20-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)
- Europe > Austria
  - Vienna (0.14)
- North America
  - Mexico > Mexico City (0.14)
  - United States (0.28)

Genre:
- Research Report (1.00)

Industry:
- Energy > Renewable (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)
  - Natural Language > Large Language Model (1.00)