TowardsCalibratedRobustFine-Tuningof Vision-LanguageModels

Neural Information Processing Systems 

Foundation models [6] such as CLIP [47] have been extensively utilized on diverse domains via pretrain-finetune approaches.