Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OODGeneralization
–Neural Information Processing Systems
Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities--specifically by directly reducing the maximum cosine similarity to a low value--we introduce a novel OOD score, named Energy.
Neural Information Processing Systems
Jun-18-2026, 11:52:47 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Overview (0.65)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Machine Learning
- Performance Analysis > Accuracy (1.00)
- Neural Networks (0.93)
- Information Technology > Artificial Intelligence