Frustratingly Easy Test-Time Adaptation of Vision-Language Models Matteo Farina 1, Giovanni Iacca 1 Massimiliano Mancini 1
–Neural Information Processing Systems
Vision-Language Models seamlessly discriminate among arbitrary semantic categories, yet they still suffer from poor generalization when presented with challenging examples. For this reason, Episodic Test-Time Adaptation (TTA) strategies have recently emerged as powerful techniques to adapt VLMs in the presence of a single unlabeled image. The recent literature on TTA is dominated by the paradigm of prompt tuning by Marginal Entropy Minimization, which, relying on online backpropagation, inevitably slows down inference while increasing memory. In this work, we theoretically investigate the properties of this approach and unveil that a surprisingly strong TTA method lies dormant and hidden within it.
Neural Information Processing Systems
Mar-27-2025, 13:18:55 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (1.00)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence