Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OODGeneralization

Jun-18-2026, 11:52:47 GMT–Neural Information Processing Systems

Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities--specifically by directly reducing the maximum cosine similarity to a low value--we introduce a novel OOD score, named Energy.

detection, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-18-2026, 11:52:47 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report > Experimental Study (1.00)
- Overview (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning
    - Performance Analysis > Accuracy (1.00)
    - Neural Networks (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found