SPRI: Aligning Large Language Models with Context-Situated Principles
Zhan, Hongli, Azmat, Muneeza, Horesh, Raya, Li, Junyi Jessy, Yurochkin, Mikhail
–arXiv.org Artificial Intelligence
Aligning Large Language Models to integrate and reflect human values, especially for tasks that demand intricate human oversight, is arduous since it is resource-intensive and time-consuming to depend on human expertise for context-specific guidance. Prior work has utilized predefined sets of rules or principles to steer the behavior of models (Bai et al., 2022; Sun et al., 2023). However, these principles tend to be generic, making it challenging to adapt them to each individual input query or context. In this work, we present Situated-PRInciples (SPRI), a framework requiring minimal or no human effort that is designed to automatically generate guiding principles in real-time for each input query and utilize them to align each response. We evaluate SPRI on three tasks, and show that 1) SPRI can derive principles in a complex domain-specific task that leads to on-par performance as expert-crafted ones; 2) SPRI-generated principles lead to instance-specific rubrics that outperform prior LLM-as-a-judge frameworks; 3) using SPRI to generate synthetic SFT data leads to substantial improvement on truthfulness. We release our code and model generations at https://github.com/honglizhan/SPRI-public.
arXiv.org Artificial Intelligence
Feb-5-2025
- Country:
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Pacific Ocean > North Pacific Ocean
- San Francisco Bay > Golden Gate (0.04)
- North America
- United States
- Arizona (0.05)
- Texas > Travis County
- Austin (0.14)
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > San Francisco County
- San Francisco (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- France (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Cambridgeshire > Cambridge (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- Singapore (0.04)
- Indonesia > Bali (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Saudi Arabia (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- South America > Colombia
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Information Technology (0.67)
- Health & Medicine
- Therapeutic Area > Psychiatry/Psychology (1.00)
- Consumer Health (1.00)
- Technology: