Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Dong, Guanting, Lu, Keming, Li, Chengpeng, Xia, Tingyu, Yu, Bowen, Zhou, Chang, Zhou, Jingren

Jul-18-2024–arXiv.org Artificial Intelligence

One core capability of large language models (LLMs) is to follow natural language instructions. However, the issue of automatically constructing high-quality training data to enhance the complex instruction-following abilities of LLMs without manual annotation remains unresolved. IF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, the corresponding code to check the correctness of the instruction responses, and unit test samples to verify the code's correctness. Then, execution feedbackbased rejection sampling can generate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) training. IF achieves significant improvements across three training algorithms, SFT, Offline DPO, and Online DPO, when applied to the top open-source LLMs, Qwen2 and LLaMA3, in self-alignment and strong-to-weak distillation settings. Our code is publicly available at https://github.com/QwenLM/AutoIF. Keep your response under 20 characters in length. Are you familiar with OET or Occupational English Test? What is the weather like today? Response 2:The weather is sunny and it Response Response 2:Yes, I'm familiar with OET. The instruction-following ability of large language models (LLMs) refers to their capacity to understand, interpret, and execute commands given to them in natural language (Lou et al., 2023; OpenAI et al., 2024). This ability is fundamental to contemporary LLMs as it enables them to leverage their underlying knowledge, interact intuitively with users (Ouyang et al., 2022), adapt to various requirements (Zhang et al., 2023), and perform complex tasks (Sun et al., 2024).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jul-18-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)

Genre:
- Research Report (0.82)

Industry:
- Education > Educational Setting (0.93)
- Health & Medicine (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found