UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality

Cheng, Zelei, Cai, Xin-Qiang, Tang, Yuting, Zhang, Pushi, Yang, Boming, Xing, Xinyu

Mar-10-2025–arXiv.org Artificial Intelligence

Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models (LLMs) with human values. However, existing approaches struggle to capture the multi-dimensional, distributional nuances of human preferences. Methods such as RiC that directly inject raw reward values into prompts face significant numerical sensitivity issues--for instance, LLMs may fail to distinguish between 9.11 and 9.8--while alternatives like MORLHF, Rewarded Soups, and MODPO incur high computational costs by training multiple models. In this work, we introduce Utility-Conditioned Multi-Objective Alignment (UC-MOA), a novel framework that overcomes these limitations. Our approach leverages a diverse set of strictly increasing, non-linear utility functions to transform user-specified preferences into symbolic tokens, which are then used to condition a single LLM. This design not only mitigates numerical reasoning challenges but also substantially reduces training overhead, yielding models that achieve superior Pareto fronts and robust alignment across complex reward dimensions.

objective, pareto front, utility function, (16 more...)

arXiv.org Artificial Intelligence

Mar-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Asia
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found