Robust Multi-Objective Controlled Decoding of Large Language Models
Son, Seongho, Bankes, William, Yoon, Sangwoong, Ramesh, Shyam Sundhar, Tang, Xiaohang, Bogunovic, Ilija
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) require alignment to become useful and safe conversational agents [Rafailov et al., 2023, Azar et al., 2023, Hong et al., 2024, Ethayarajh et al., 2024, Wu et al., 2024]. However, human preferences are diverse and nuanced, leading recent work to frame alignment as a multi-objective problem [Zhao et al., 2023, Shi et al., 2024] over a variety of desirable attributes and alignment objectives, for example, helpfulness, safety, honesty, and conciseness. Test time alignment [Mudgal et al., 2023] enables flexible control over the importance of different objectives at inference time without expensive retraining. This is a useful property as the alignment of an LLM can be varied to address a specific task, prompt, or interaction with a variety of users with diverse preferences [Sorensen et al., 2024b]. Existing methods for multi-objective alignment often formalize this problem through a weight vector that characterizes the relative importance of the objectives at deployment [Shi et al., 2024, Wang et al., 2024b,a, Rame et al., 2024]. In practice, the correct weighting of objectives is often unknown, leading to models that over-optimize specific alignment goals whilst under-prioritizing others. To address this problem, recent work has proposed several solutions, including treating weights as hyperparameters [Shi et al., 2024], learning specific weightings for different groups [Zhao et al.,
arXiv.org Artificial Intelligence
Mar-11-2025
- Country:
- Europe > United Kingdom (0.14)
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Technology: