C3AI: Crafting and Evaluating Constitutions for Constitutional AI

Kyrychenko, Yara, Zhou, Ke, Bogucka, Edyta, Quercia, Daniele

Feb-21-2025–arXiv.org Artificial Intelligence

Constitutional AI (CAI) guides LLM behavior using constitutions, but identifying which principles are most effective for model alignment remains an open challenge. We introduce the C3AI framework (\textit{Crafting Constitutions for CAI models}), which serves two key functions: (1) selecting and structuring principles to form effective constitutions before fine-tuning; and (2) evaluating whether fine-tuned CAI models follow these principles in practice. By analyzing principles from AI and psychology, we found that positively framed, behavior-based principles align more closely with human preferences than negatively framed or trait-based principles. In a safety alignment use case, we applied a graph-based principle selection method to refine an existing CAI constitution, improving safety measures while maintaining strong general reasoning capabilities. Interestingly, fine-tuned CAI models performed well on negatively framed principles but struggled with positively framed ones, in contrast to our human alignment results. This highlights a potential gap between principle design and model adherence. Overall, C3AI provides a structured and scalable approach to both crafting and evaluating CAI constitutions.

anthropic choose, constitution, human value, (13 more...)

arXiv.org Artificial Intelligence

Feb-21-2025

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.05)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
    - Nottinghamshire > Nottingham (0.14)
    - Oxfordshire > Oxford (0.04)
  - Latvia > Lubāna Municipality
    - Lubāna (0.04)
  - Italy > Piedmont
    - Turin Province > Turin (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)

Genre:
- Research Report > New Finding (0.93)
- Overview (0.87)

Industry:
- Law > Civil Rights & Constitutional Law (1.00)
- Government (0.93)
- Information Technology > Security & Privacy (0.92)
- Health & Medicine > Consumer Health (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found