REAL: Response Embedding-based Alignment for LLMs

Zhang, Honggen, Zhao, Xufeng, Molybog, Igor, Zhang, June

Dec-20-2024–arXiv.org Artificial Intelligence

Aligning large language models (LLMs) to human preferences is a crucial step in building helpful and safe AI tools, which usually involve training on supervised datasets. Popular algorithms such as Direct Preference Optimization rely on pairs of AI-generated responses ranked according to human feedback. The response pair annotation process is the most labor-intensive and costly part of the alignment pipeline, and improving its efficiency and annotation quality would have a meaningful impact on AI development. We propose REAL: Response Embedding-based Alignment for LLMs, a strategy for constructing a high-quality training dataset that focuses on acquiring the most informative response pairs for labeling out of a set of response candidates. Our selection process is based on embedding responses independently of prompts. Experimental results on real-world dataset SHP2 and synthetic HH-RLHF benchmarks indicate that choosing dissimilar response pairs enhances the direct alignment of LLMs while reducing inherited labeling errors. The model aligned on dissimilar response pairs obtained a better margin and win rate on the dialogue task. Our findings suggest that focusing on distinct pairs can reduce the label error to improve the efficiency of LLM alignment, saving up to 65% of annotators' work.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Dec-20-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Hawaii (0.04)
  - New York > New York County
    - New York City (0.04)

Genre:
- Research Report > New Finding (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.92)