From Evaluation to Defense: Advancing Safety in Video Large Language Models

Sun, Yiwei, Jiang, Peiqi, Liu, Chuanbin, Lin, Luohao, Lu, Zhiying, Xie, Hongtao

May-23-2025–arXiv.org Artificial Intelligence

While the safety risks of image-based large language models have been extensively studied, their video-based counterparts (Video LLMs) remain critically under-examined. To systematically study this problem, we introduce \textbf{VideoSafetyBench (VSB-77k) - the first large-scale, culturally diverse benchmark for Video LLM safety}, which compromises 77,646 video-query pairs and spans 19 principal risk categories across 10 language communities. \textit{We reveal that integrating video modality degrades safety performance by an average of 42.3\%, exposing systemic risks in multimodal attack exploitation.} To address this vulnerability, we propose \textbf{VideoSafety-R1}, a dual-stage framework achieving unprecedented safety gains through two innovations: (1) Alarm Token-Guided Safety Fine-Tuning (AT-SFT) injects learnable alarm tokens into visual and textual sequences, enabling explicit harm perception across modalities via multitask objectives. (2) Then, Safety-Guided GRPO enhances defensive reasoning through dynamic policy optimization with rule-based rewards derived from dual-modality verification. These components synergize to shift safety alignment from passive harm recognition to active reasoning. The resulting framework achieves a 65.1\% improvement on VSB-Eval-HH, and improves by 59.1\%, 44.3\%, and 15.0\% on the image safety datasets MMBench, VLGuard, and FigStep, respectively. \textit{Our codes are available in the supplementary materials.} \textcolor{red}{Warning: This paper contains examples of harmful language and videos, and reader discretion is recommended.}

large language model, machine learning, qwen2, (18 more...)

arXiv.org Artificial Intelligence

May-23-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report (0.50)

Industry:
- Information Technology > Security & Privacy (1.00)
- Law Enforcement & Public Safety
  - Terrorism (1.00)
  - Crime Prevention & Enforcement (1.00)
  - Fraud (0.92)
- Law
  - Criminal Law (1.00)
  - Civil Rights & Constitutional Law (1.00)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Consumer Health (1.00)
  - Therapeutic Area > Psychiatry/Psychology
    - Mental Health (1.00)
    - Addiction Disorder (1.00)
- Government > Military
  - Cyberwarfare (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found