Aligning to Thousands of Preferences via System Message Generalization Seongyun Lee 1 Sue Hyun Park 1 Seungone Kim 2

Mar-23-2025, 09:49:41 GMT–Neural Information Processing Systems

Although humans inherently have diverse values, current large language model (LLM) alignment methods often assume that aligning LLMs with the general public's preferences is optimal. A major challenge in adopting a more individualized approach to LLM alignment is its lack of scalability, as it involves repeatedly acquiring preference data and training new reward models and LLMs for each individual's preferences. To address these challenges, we propose a new paradigm where users specify what they value most within the system message, steering the LLM's generation behavior to better align with the user's intentions. However, a naive application of such an approach is non-trivial since LLMs are typically trained on a uniform system message (e.g., "You are a helpful assistant"), which limits their ability to generalize to diverse, unseen system messages.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Mar-23-2025, 09:49:41 GMT

Conferences PDF

Add feedback

Country:
- North America
  - Mexico > Mexico City (0.14)
  - United States > Louisiana (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Education (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found