Direct Alignment with Heterogeneous Preferences

Shirali, Ali, Nasr-Esfahany, Arash, Alomar, Abdullah, Mirtaheri, Parsa, Abebe, Rediet, Procaccia, Ariel

Feb-22-2025–arXiv.org Artificial Intelligence

This tension in assumptions is readily apparent in standard human-AI alignment methods--such as reinforcement learning from human feedback (RLHF) [6, 7, 8] and direct preference optimization (DPO) [9]--which assume a single reward function captures the interests of the entire population. We examine the limits of the preference homogeneity assumption when individuals belong to user types, each characterized by a specific reward function. Recent work has shown that in this setting, the homogeneity assumption can lead to unexpected behavior [10, 11, 12]. One challenge is that, under this assumption, learning from human preferences becomes unrealizable, as a single reward function cannot capture the complexity of population preferences with multiple reward functions [13, 14]. Both RLHF and DPO rely on maximum likelihood estimation (MLE) to optimize the reward or policy. Unrealizability implies their likelihood functions cannot fully represent the underlying preference data distribution, resulting in a nontrivial optimal MLE solution. From another perspective, learning a universal reward or policy from a heterogeneous population inherently involves an aggregation of diverse interests, and this aggregation is nontrivial. In the quest for a single policy that accommodates a heterogeneous population with multiple user types, we show that the only universal reward yielding a well-defined alignment problem is an affine Equal contribution Work done while visiting Harvard Equal advising 1 arXiv:2502.16320v1

artificial intelligence, machine learning, user type, (18 more...)

arXiv.org Artificial Intelligence

Feb-22-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Questionnaire & Opinion Survey (0.92)
- Research Report (0.82)

Industry:
- Government > Regional Government > North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.54)
    - Neural Networks > Deep Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.68)