Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection
Adaloglou, Nikolas, Michels, Felix, Kaiser, Tim, Kollmann, Markus
–arXiv.org Artificial Intelligence
We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection, focusing on adapting contrastive language-image pretrained (CLIP) models. Without fine-tuning on the training data, we are able to establish a positive correlation ($R^2\geq0.92$) between in-distribution classification and unsupervised OOD detection for CLIP models in $4$ benchmarks. We further propose a new simple and scalable method called \textit{pseudo-label probing} (PLP) that adapts vision-language models for OOD detection. Given a set of label names of the training set, PLP trains a linear layer using the pseudo-labels derived from the text encoder of CLIP. To test the OOD detection robustness of pretrained models, we develop a novel feature-based adversarial OOD data manipulation approach to create adversarial samples. Intriguingly, we show that (i) PLP outperforms the previous state-of-the-art \citep{ming2022mcm} on all $5$ large-scale benchmarks based on ImageNet, specifically by an average AUROC gain of 3.4\% using the largest CLIP model (ViT-G), (ii) we show that linear probing outperforms fine-tuning by large margins for CLIP architectures (i.e. CLIP ViT-H achieves a mean gain of 7.3\% AUROC on average on all ImageNet-based benchmarks), and (iii) billion-parameter CLIP models still fail at detecting adversarially manipulated OOD images. The code and adversarially created datasets will be made publicly available.
arXiv.org Artificial Intelligence
Nov-9-2023
- Country:
- North America
- United States > California
- San Diego County > San Diego (0.04)
- Canada > Alberta
- United States > California
- Europe
- Greece (0.04)
- United Kingdom > England
- Bristol (0.04)
- Germany > North Rhine-Westphalia
- Düsseldorf Region > Düsseldorf (0.04)
- Asia > Middle East
- Israel > Tel Aviv District > Tel Aviv (0.04)
- North America
- Genre:
- Research Report > New Finding (0.34)
- Technology:
- Information Technology
- Data Science > Data Mining (0.93)
- Sensing and Signal Processing > Image Processing (0.93)
- Artificial Intelligence
- Vision (1.00)
- Natural Language (1.00)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (0.68)
- Performance Analysis > Accuracy (0.68)
- Information Technology