Collective Constitutional AI: Aligning a Language Model with Public Input
Huang, Saffron, Siddarth, Divya, Lovitt, Liane, Liao, Thomas I., Durmus, Esin, Tamkin, Alex, Ganguli, Deep
–arXiv.org Artificial Intelligence
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal. These results demonstrate a promising, tractable pathway toward publicly informed development of language models.
arXiv.org Artificial Intelligence
Jun-11-2024
- Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Government (1.00)
- Health & Medicine (1.00)
- Law > Civil Rights & Constitutional Law (0.93)
- Technology: