Aligning to Thousands of Preferences via System Message Generalization Seongyun Lee 1 Sue Hyun Park 1 Seungone Kim 2
–Neural Information Processing Systems
Although humans inherently have diverse values, current large language model (LLM) alignment methods often assume that aligning LLMs with the general public's preferences is optimal. A major challenge in adopting a more individualized approach to LLM alignment is its lack of scalability, as it involves repeatedly acquiring preference data and training new reward models and LLMs for each individual's preferences. To address these challenges, we propose a new paradigm where users specify what they value most within the system message, steering the LLM's generation behavior to better align with the user's intentions. However, a naive application of such an approach is non-trivial since LLMs are typically trained on a uniform system message (e.g., "You are a helpful assistant"), which limits their ability to generalize to diverse, unseen system messages.
Neural Information Processing Systems
Mar-23-2025, 09:49:41 GMT
- Country:
- North America
- Mexico > Mexico City (0.14)
- United States > Louisiana (0.14)
- North America
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Education (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.67)
- Technology: