How Effective Is Constitutional AI in Small LLMs? A Study on DeepSeek-R1 and Its Peers

Menke, Antonio-Gabriel Chacón, Tan, Phan Xuan

Feb-1-2025–arXiv.org Artificial Intelligence

Recent incidents highlight safety risks in Large Language Models (LLMs), motivating research into alignment methods like Constitutional AI (CAI). This paper explores CAI's self-critique mechanism on small, uncensored 7-9B parameter models: DeepSeek-R1, Gemma-2, Llama 3.1, and Qwen2.5. Using HarmBench, we demonstrate that while all models showed capacity for harm reduction through self-critique, effectiveness varied significantly, with DeepSeek-R1's explicit reasoning process yielding superior results. These findings suggest that CAI-inspired prompting strategies can enhance safety in resource-constrained models, though success depends on the model's capacity for harm detection.

constitutional ai, deepseek-r1, language model, (15 more...)

arXiv.org Artificial Intelligence

Feb-1-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.04)
- North America > United States
  - New York (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Law > Civil Rights & Constitutional Law (0.42)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.36)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found