A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications

Feb-22-2025–arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) are increasingly deployed in public sector missions, necessitating robust evaluation of their safety and vulnerability to adversarial attacks. This paper introduces a novel framework to quantify adversarial risks in VLMs. We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noise patches and saliency patterns that highlight vulnerable regions. These patterns are compared against the Fast Gradient Sign Method (FGSM) to assess their adversarial effectiveness. We propose a new Vulnerability Score that combines the impact of random noise and adversarial attacks, providing a comprehensive metric for evaluating model robustness.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Feb-22-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.69)

Genre:
- Research Report > New Finding (0.47)

Industry:
- Government (1.00)
- Information Technology > Security & Privacy (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Issues > Social & Ethical Issues (0.65)
  - Machine Learning (1.00)
  - Natural Language (1.00)
  - Vision (1.00)