The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models

Xu, Zhiyuan, Gardiner, Joseph, Belguith, Sana

Feb-3-2025–arXiv.org Artificial Intelligence

As one of the few Chain-of-Thought (CoT) reasoning models--and notably the first open-source implementation of its kind--DeepSeek-R1 has demonstrated remarkable improvements in the performance of complex reasoning tasks. Experimental results show that DeepSeek-R1 not only achieves CoT reasoning but also significantly reduces computational resource requirements [1]. Furthermore, it has outperformed comparable models, such as ChatGPT-o1, in certain benchmark tests, showcasing exceptional performance advantages. However, while the CoT approach significantly enhances reasoning capabilities, it also brings forth security concerns that warrant attention. Due to the influence of scaling laws, the volume of data used during the training of LLMs has reached unprecedented levels. Although extensive methods have been employed to sanitize the data during collection and filtering [2], technical limitations and resource constraints have resulted in a considerable amount of harmful content remaining in the training data.

large language model, machine learning, reasoning process, (18 more...)

arXiv.org Artificial Intelligence

Feb-3-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.48)

Industry:
- Commercial Services & Supplies > Security & Alarm Services (1.00)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found