Output Length Effect on DeepSeek-R1's Safety in Forced Thinking

Li, Xuying, Li, Zhuo, Kosuga, Yuji, Bian, Victor

Mar-2-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated strong reasoning capabilities, but their safety under adversarial conditions remains a challenge. This study examines the impact of output length on the robustness of DeepSeek-R1, particularly in Forced Thinking scenarios. We analyze responses across various adversarial prompts and find that while longer outputs can improve safety through self-correction, certain attack types exploit extended generations. Our findings suggest that output length should be dynamically controlled to balance reasoning effectiveness and security. We propose reinforcement learning-based policy adjustments and adaptive token length regulation to enhance LLM safety.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-2-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Industry:
- Government > Military (0.69)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.87)
  - Natural Language > Large Language Model (1.00)