Evolving Alignment via Asymmetric Self-Play

Ye, Ziyu, Agarwal, Rishabh, Liu, Tianqi, Joshi, Rishabh, Velury, Sarmishta, Le, Quoc V., Tan, Qijun, Liu, Yuan

Dec-12-2024–arXiv.org Machine Learning

Current RLHF frameworks for aligning large language models (LLMs) typically assume a fixed prompt distribution, which is sub-optimal and limits the scalability of alignment and generalizability of models. To address this, we introduce a general open-ended RLHF framework that casts alignment as an asymmetric game between two players: (i) a creator that generates increasingly informative prompt distributions using reward signals, and (ii) a solver that learns to produce more preferred responses on prompts produced by the creator. This framework of Evolving Alignment via Asymmetric Self-Play (eva), results in a simple and efficient approach that can utilize any existing RLHF algorithm for scalable alignment. eva outperforms state-of-the-art methods on widely-used benchmarks, without the need of any additional human crafted prompts. Specifically, eva improves the win rate of Gemma-2-9B-it on Arena-Hard from 51.6% to 60.1% with DPO, from 55.7% to 58.9% with SPPO, from 52.3% to 60.7% with SimPO, and from 54.8% to 60.3% with ORPO, surpassing its 27B version and matching claude-3-opus. This improvement is persistent even when new human crafted prompts are introduced. Finally, we show eva is effective and robust under various ablation settings.

arxiv preprint arxiv, complexity, scalable alignment, (13 more...)

arXiv.org Machine Learning

Dec-12-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois > Cook County > Chicago (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Instructional Material > Course Syllabus & Notes (0.45)
- Research Report > New Finding (0.45)

Industry:
- Leisure & Entertainment > Games (0.67)
- Education > Curriculum (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found