Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation

Tang, Shuo, Pang, Xianghe, Liu, Zexi, Tang, Bohan, Ye, Rui, Dong, Xiaowen, Wang, Yanfeng, Chen, Siheng

Oct-18-2024–arXiv.org Artificial Intelligence

We conducted experiments comparing the effectiveness of using simpler versus more complex dataset in different stages of the post-training process to better understand the optimal post-training strategy for large language models. Here we conduct comparison experiment on two kinds of instructions: simple instructions and specialized instructions, denoted as type 1 and type 2. As showen in Table 10, we observe that performing SFT on simpler instructions helps the model to establish a foundational level of instruction-following ability. This is reflected in moderate performance on AlpacaEval 2 (LC 16.25%, WR 17.62%) but lower performance on the more challenging Arena-Hard benchmark (WR 10.7%). When the model is fine-tuned on more specialized and complex data, there is a marginal improvement (LC 14.70%, WR 16.01%, Arena-Hard WR 14.7%), and the significant performance gains are achieved when DPO is applied after SFT. For example, SFT followed by DPO with complex, specialized instructions yields substantial improvements (LC 21.64%, WR 30.06%,

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-18-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.46)

Genre:
- Research Report (0.64)
- Workflow (0.48)

Industry:
- Banking & Finance (0.92)
- Education (0.67)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.67)
- Law (0.93)
- Leisure & Entertainment (1.00)
- Media > Music (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.96)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Agents
    - Agent Societies (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found