Robust Multi-Objective Controlled Decoding of Large Language Models

Son, Seongho, Bankes, William, Yoon, Sangwoong, Ramesh, Shyam Sundhar, Tang, Xiaohang, Bogunovic, Ilija

Mar-11-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) require alignment to become useful and safe conversational agents [Rafailov et al., 2023, Azar et al., 2023, Hong et al., 2024, Ethayarajh et al., 2024, Wu et al., 2024]. However, human preferences are diverse and nuanced, leading recent work to frame alignment as a multi-objective problem [Zhao et al., 2023, Shi et al., 2024] over a variety of desirable attributes and alignment objectives, for example, helpfulness, safety, honesty, and conciseness. Test time alignment [Mudgal et al., 2023] enables flexible control over the importance of different objectives at inference time without expensive retraining. This is a useful property as the alignment of an LLM can be varied to address a specific task, prompt, or interaction with a variety of users with diverse preferences [Sorensen et al., 2024b]. Existing methods for multi-objective alignment often formalize this problem through a weight vector that characterizes the relative importance of the objectives at deployment [Shi et al., 2024, Wang et al., 2024b,a, Rame et al., 2024]. In practice, the correct weighting of objectives is often unknown, leading to models that over-optimize specific alignment goals whilst under-prioritizing others. To address this problem, recent work has proposed several solutions, including treating weights as hyperparameters [Shi et al., 2024], learning specific weightings for different groups [Zhao et al.,

arxiv preprint arxiv, equation, objective, (11 more...)

arXiv.org Artificial Intelligence

Mar-11-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Greater London > London (0.04)
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found