From Evaluation to Defense: Advancing Safety in Video Large Language Models
Sun, Yiwei, Jiang, Peiqi, Liu, Chuanbin, Lin, Luohao, Lu, Zhiying, Xie, Hongtao
–arXiv.org Artificial Intelligence
While the safety risks of image-based large language models have been extensively studied, their video-based counterparts (Video LLMs) remain critically under-examined. To systematically study this problem, we introduce \textbf{VideoSafetyBench (VSB-77k) - the first large-scale, culturally diverse benchmark for Video LLM safety}, which compromises 77,646 video-query pairs and spans 19 principal risk categories across 10 language communities. \textit{We reveal that integrating video modality degrades safety performance by an average of 42.3\%, exposing systemic risks in multimodal attack exploitation.} To address this vulnerability, we propose \textbf{VideoSafety-R1}, a dual-stage framework achieving unprecedented safety gains through two innovations: (1) Alarm Token-Guided Safety Fine-Tuning (AT-SFT) injects learnable alarm tokens into visual and textual sequences, enabling explicit harm perception across modalities via multitask objectives. (2) Then, Safety-Guided GRPO enhances defensive reasoning through dynamic policy optimization with rule-based rewards derived from dual-modality verification. These components synergize to shift safety alignment from passive harm recognition to active reasoning. The resulting framework achieves a 65.1\% improvement on VSB-Eval-HH, and improves by 59.1\%, 44.3\%, and 15.0\% on the image safety datasets MMBench, VLGuard, and FigStep, respectively. \textit{Our codes are available in the supplementary materials.} \textcolor{red}{Warning: This paper contains examples of harmful language and videos, and reader discretion is recommended.}
arXiv.org Artificial Intelligence
May-23-2025
- Genre:
- Research Report (0.50)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Law Enforcement & Public Safety
- Terrorism (1.00)
- Crime Prevention & Enforcement (1.00)
- Fraud (0.92)
- Law
- Criminal Law (1.00)
- Civil Rights & Constitutional Law (1.00)
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Consumer Health (1.00)
- Therapeutic Area > Psychiatry/Psychology
- Mental Health (1.00)
- Addiction Disorder (1.00)
- Government > Military
- Cyberwarfare (0.67)
- Technology: