How Effective Is Constitutional AI in Small LLMs? A Study on DeepSeek-R1 and Its Peers
Menke, Antonio-Gabriel Chacón, Tan, Phan Xuan
–arXiv.org Artificial Intelligence
Recent incidents highlight safety risks in Large Language Models (LLMs), motivating research into alignment methods like Constitutional AI (CAI). This paper explores CAI's self-critique mechanism on small, uncensored 7-9B parameter models: DeepSeek-R1, Gemma-2, Llama 3.1, and Qwen2.5. Using HarmBench, we demonstrate that while all models showed capacity for harm reduction through self-critique, effectiveness varied significantly, with DeepSeek-R1's explicit reasoning process yielding superior results. These findings suggest that CAI-inspired prompting strategies can enhance safety in resource-constrained models, though success depends on the model's capacity for harm detection.
arXiv.org Artificial Intelligence
Feb-1-2025
- Country:
- Europe > Germany (0.04)
- North America > United States
- New York (0.04)
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Technology: