How Effective Is Constitutional AI in Small LLMs? A Study on DeepSeek-R1 and Its Peers

Menke, Antonio-Gabriel Chacón, Tan, Phan Xuan

arXiv.org Artificial Intelligence 

Recent incidents highlight safety risks in Large Language Models (LLMs), motivating research into alignment methods like Constitutional AI (CAI). This paper explores CAI's self-critique mechanism on small, uncensored 7-9B parameter models: DeepSeek-R1, Gemma-2, Llama 3.1, and Qwen2.5. Using HarmBench, we demonstrate that while all models showed capacity for harm reduction through self-critique, effectiveness varied significantly, with DeepSeek-R1's explicit reasoning process yielding superior results. These findings suggest that CAI-inspired prompting strategies can enhance safety in resource-constrained models, though success depends on the model's capacity for harm detection.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found