Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection

Adaloglou, Nikolas, Michels, Felix, Kaiser, Tim, Kollmann, Markus

Nov-9-2023–arXiv.org Artificial Intelligence

We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection, focusing on adapting contrastive language-image pretrained (CLIP) models. Without fine-tuning on the training data, we are able to establish a positive correlation ($R^2\geq0.92$) between in-distribution classification and unsupervised OOD detection for CLIP models in $4$ benchmarks. We further propose a new simple and scalable method called \textit{pseudo-label probing} (PLP) that adapts vision-language models for OOD detection. Given a set of label names of the training set, PLP trains a linear layer using the pseudo-labels derived from the text encoder of CLIP. To test the OOD detection robustness of pretrained models, we develop a novel feature-based adversarial OOD data manipulation approach to create adversarial samples. Intriguingly, we show that (i) PLP outperforms the previous state-of-the-art \citep{ming2022mcm} on all $5$ large-scale benchmarks based on ImageNet, specifically by an average AUROC gain of 3.4\% using the largest CLIP model (ViT-G), (ii) we show that linear probing outperforms fine-tuning by large margins for CLIP architectures (i.e. CLIP ViT-H achieves a mean gain of 7.3\% AUROC on average on all ImageNet-based benchmarks), and (iii) billion-parameter CLIP models still fail at detecting adversarially manipulated OOD images. The code and adversarially created datasets will be made publicly available.

benchmark, detection, international conference, (13 more...)

arXiv.org Artificial Intelligence

Nov-9-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > California
    - San Diego County > San Diego (0.04)
  - Canada > Alberta
    - Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe
  - Greece (0.04)
  - United Kingdom > England
    - Bristol (0.04)
  - Germany > North Rhine-Westphalia
    - Düsseldorf Region > Düsseldorf (0.04)
- Asia > Middle East
  - Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology
  - Data Science > Data Mining (0.93)
  - Sensing and Signal Processing > Image Processing (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (0.68)
      - Performance Analysis > Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found