Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?

Lu, Chengda, Fan, Xiaoyu, Huang, Yu, Xu, Rongwu, Li, Jijie, Xu, Wei

May-26-2025–arXiv.org Artificial Intelligence

Jailbreak attacks have been observed to largely fail against recent reasoning models enhanced by Chain-of-Thought (CoT) reasoning. However, the underlying mechanism remains underexplored, and relying solely on reasoning capacity may raise security concerns. In this paper, we try to answer the question: Does CoT reasoning really reduce harmfulness from jailbreaking? Through rigorous theoretical analysis, we demonstrate that CoT reasoning has dual effects on jailbreaking harmfulness. Based on the theoretical insights, we propose a novel jailbreak method, FicDetail, whose practical performance validates our theoretical findings.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

May-26-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (1.00)
- Energy (1.00)
- Law (0.92)
- Materials > Chemicals
  - Commodity Chemicals (0.68)

Technology:
- Information Technology
  - Security & Privacy (0.92)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (0.97)
    - Machine Learning > Neural Networks
      - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found