Video-SafetyBench: ABenchmark for Safety Evaluation of Video LVLMs 1,2 3 2 1 Xuannan 1 Liu

Neural Information Processing Systems 

The increasing deployment of Large Vision-Language Models (LVLMs) raises safety concerns under potential malicious inputs. However, existing multimodal safety evaluations primarily focus on model vulnerabilities exposed by static image inputs, ignoring the temporal dynamics of video that may induce distinct safety risks. To bridge this gap, we introduce Video-SafetyBench, the first comprehensive benchmark designed to evaluate the safety of LVLMs under video-text attacks.