A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications
Rashid, Maisha Binte, Rivas, Pablo
–arXiv.org Artificial Intelligence
Vision-Language Models (VLMs) are increasingly deployed in public sector missions, necessitating robust evaluation of their safety and vulnerability to adversarial attacks. This paper introduces a novel framework to quantify adversarial risks in VLMs. We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noise patches and saliency patterns that highlight vulnerable regions. These patterns are compared against the Fast Gradient Sign Method (FGSM) to assess their adversarial effectiveness. We propose a new Vulnerability Score that combines the impact of random noise and adversarial attacks, providing a comprehensive metric for evaluating model robustness.
arXiv.org Artificial Intelligence
Feb-22-2025
- Country:
- North America > United States (0.69)
- Genre:
- Research Report > New Finding (0.47)
- Industry:
- Government (1.00)
- Information Technology > Security & Privacy (0.93)
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.65)
- Machine Learning (1.00)
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence