WATT: Weight Average Test-Time Adaptation of CLIP Gustavo A. Vargas Hakim Moslem Yazdanpanah Ali Bahri Milad Cheraghalikhani Sahar Dastani Farzad Beizaee Ismail Ben Ayed Christian Desrosiers

Neural Information Processing Systems 

Vision-Language Models (VLMs) such as CLIP have yielded unprecedented performances for zero-shot image classification, yet their generalization capability may still be seriously challenged when confronted to domain shifts.