Collective Constitutional AI: Aligning a Language Model with Public Input

Huang, Saffron, Siddarth, Divya, Lovitt, Liane, Liao, Thomas I., Durmus, Esin, Tamkin, Alex, Ganguli, Deep

Jun-11-2024–arXiv.org Artificial Intelligence

There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal. These results demonstrate a promising, tractable pathway toward publicly informed development of language models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jun-11-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Government (1.00)
- Health & Medicine (1.00)
- Law > Civil Rights & Constitutional Law (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Issues > Social & Ethical Issues (0.45)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Chatbot (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found