An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity

Lee, Yuxiao, Cao, Xiaofeng, Ye, Wei, Yao, Jiangchao, Song, Jingkuan, Shen, Heng Tao

Sep-18-2025–arXiv.org Artificial Intelligence

Vision-Language Models (VLMs), such as CLIP, have demonstrated remarkable zero-shot out-of-distribution (OOD) detection capabilities, vital for reliable AI systems. Despite this promising capability, a comprehensive understanding of (1) why they work so effectively, (2) what advantages do they have over single-modal methods, and (3) how is their behavioral robustness -- remains notably incomplete within the research community. This paper presents a systematic empirical analysis of VLM-based OOD detection using in-distribution (ID) and OOD prompts. (1) Mechanisms: We systematically characterize and formalize key operational properties within the VLM embedding space that facilitate zero-shot OOD detection. (2) Advantages: We empirically quantify the superiority of these models over established single-modal approaches, attributing this distinct advantage to the VLM's capacity to leverage rich semantic novelty. (3) Sensitivity: We uncovers a significant and previously under-explored asymmetry in their robustness profile: while exhibiting resilience to common image noise, these VLM-based methods are highly sensitive to prompt phrasing. Our findings contribute a more structured understanding of the strengths and critical vulnerabilities inherent in VLM-based OOD detection, offering crucial, empirically-grounded guidance for developing more robust and reliable future designs.

data mining, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Sep-18-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland (0.28)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language
      - Text Processing (1.00)
      - Large Language Model (0.87)
    - Machine Learning
      - Performance Analysis > Accuracy (1.00)
      - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found