Aligning Language Models with Human Preferences via a Bayesian Approach

Feb-11-2025, 06:00:41 GMT–Neural Information Processing Systems

In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences.

aligning language model, bayesian approach, human preference, (3 more...)

Neural Information Processing Systems

Feb-11-2025, 06:00:41 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Generation (0.99)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.53)
  - Machine Learning > Learning Graphical Models
    - Directed Networks > Bayesian Learning (0.53)