MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

Yang, Kailai, Liu, Zhiwei, Xie, Qianqian, Huang, Jimin, Zhang, Tianlin, Ananiadou, Sophia

May-6-2024–arXiv.org Artificial Intelligence

Recent advancements in large language models (LLMs) aim to tackle heterogeneous human expectations and values via multi-objective preference alignment. However, existing methods are parameter-adherent to the policy model, leading to two key limitations: (1) the high-cost repetition of their alignment algorithms for each new target model; (2) they cannot expand to unseen objectives due to their static alignment objectives. In this work, we propose Meta-Objective Aligner (MetaAligner), a model that performs conditional weak-to-strong correction for weak responses to approach strong responses. MetaAligner is the first policy-agnostic and generalizable method for multi-objective preference alignment, which enables plug-and-play alignment by decoupling parameter updates from the policy models and facilitates zero-shot preference alignment for unseen objectives via in-context learning. Experimental results show that MetaAligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, and outperforms previous alignment methods with down to 15.71x less GPU training hours. The model also effectively aligns unseen objectives, marking the first step towards generalizable multi-objective preference alignment.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

May-6-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.88)

Industry:
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)