Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OODGeneralization

Neural Information Processing Systems 

Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities--specifically by directly reducing the maximum cosine similarity to a low value--we introduce a novel OOD score, named Energy.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found