Negative Label Guided OOD Detection with Pretrained Vision-Language Models
Jiang, Xue, Liu, Feng, Fang, Zhen, Chen, Hong, Liu, Tongliang, Zheng, Feng, Han, Bo
–arXiv.org Artificial Intelligence
Out-of-distribution (OOD) detection aims at identifying samples from unknown classes, playing a crucial role in trustworthy models against errors on unexpected inputs. Extensive research has been dedicated to exploring OOD detection in the vision modality. Vision-language models (VLMs) can leverage both textual and visual information for various multi-modal applications, whereas few OOD detection methods take into account information from the text modality. In this paper, we propose a novel post hoc OOD detection method, called NegLabel, which takes a vast number of negative labels from extensive corpus databases. We design a novel scheme for the OOD score collaborated with negative labels. Theoretical analysis helps to understand the mechanism of negative labels. Extensive experiments demonstrate that our method NegLabel achieves state-ofthe-art performance on various OOD detection benchmarks and generalizes well on multiple VLM architectures. Furthermore, our method NegLabel exhibits remarkable robustness against diverse domain shifts. In open-world scenarios, deploying machine learning models faces a critical challenge: how to handle data from unknown classes, commonly referred to as out-of-distribution (OOD) data (Hendrycks & Gimpel, 2017). The presence of OOD data can lead to models exhibiting overconfidence, potentially resulting in severe errors or security risks. This issue is particularly pronounced in critical applications, such as autonomous vehicles and medical diagnosis. Therefore, detecting and rejecting OOD data plays a crucial role in ensuring the reliability and safety of the model. Traditional visual OOD detection methods (Hsu et al., 2020a; Wang et al., 2021b; Huang et al., 2021; Sun et al., 2021; Wang et al., 2021a) typically rely solely on image information, ignoring the rich textual information carried by labels. Vision-language models (VLMs) can leverage multimodal information, which is also beneficial for OOD detection. Some recently proposed methods attempt to design dedicated OOD detectors for VLMs. Specifically, ZOC (Esmaeilpour et al., 2022) defines the new task - zero-shot OOD detection, and uses a trainable captioner to generate candidate OOD labels to match OOD images. However, when dealing with large-scale datasets encompassing a multitude of in-distribution (ID) classes, like ImageNet-1k, the captioner may not generate effective candidate OOD labels, resulting in poor performance. MCM (Ming et al., 2022a) uses the maximum logit of scaled softmax to identify OOD images. However, MCM only employs information from the ID label space and does not effectively exploit the text interpretation capabilities of VLMs.
arXiv.org Artificial Intelligence
Mar-29-2024
- Country:
- Asia (0.67)
- North America > United States (0.67)
- Genre:
- Research Report > New Finding (0.92)
- Industry:
- Health & Medicine (0.66)
- Transportation > Ground (0.45)
- Technology: