SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF

Chegini, Atoosa, Kazemi, Hamid, Mirzadeh, Iman, Yin, Dong, Horton, Maxwell, Nabi, Moin, Farajtabar, Mehrdad, Alizadeh, Keivan

Nov-3-2024–arXiv.org Artificial Intelligence

In Large Language Model (LLM) development, Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning models with human values and preferences. RLHF traditionally relies on the Kullback-Leibler (KL) divergence between the current policy and a frozen initial policy as a reference, which is added as a penalty in policy optimization algorithms like Proximal Policy Optimization (PPO). While this constraint prevents models from deviating too far from the initial checkpoint, it limits exploration of the reward landscape, reducing the model's ability to discover higher-quality solutions. As a result, policy optimization is often trapped in a narrow region of the parameter space, leading to suboptimal alignment and performance. This paper presents SALSA (Soup-based Alignment Learning for Stronger Adaptation), a novel approach designed to overcome these limitations by creating a more flexible and better located reference model through weight-space averaging of two independent supervised fine-tuned (SFT) models. This model soup allows for larger deviation in KL divergence and exploring a promising region of the solution space without sacrificing stability.

machine learning, natural language, salsa, (16 more...)

arXiv.org Artificial Intelligence

Nov-3-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > Promising Solution (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)