SafeCoT: Improving VLM Safety with Minimal Reasoning

Ma, Jiachen, Zhou, Zhanhui, Yang, Chao, Lu, Chaochao

Jun-12-2025–arXiv.org Artificial Intelligence

Ensuring safe and appropriate responses from vision-language models (VLMs) remains a critical challenge, particularly in high-risk or ambiguous scenarios. We introduce SafeCoT, a lightweight, interpretable framework that leverages rule-based chain-of-thought (CoT) supervision to improve refusal behavior in VLMs. Unlike prior methods that rely on large-scale safety annotations or complex modeling, SafeCoT uses minimal supervision to help models reason about safety risks and make context-aware refusals. Experiments across multiple benchmarks show that SafeCoT significantly reduces overrefusal and enhances generalization, even with limited training data. Our approach offers a scalable solution for aligning VLMs with safety-critical objectives.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jun-12-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Information Technology > Security & Privacy (0.93)
- Health & Medicine
  - Consumer Health (0.93)
  - Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.48)
  - Representation & Reasoning > Rule-Based Reasoning (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found