Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models

Wang, Shuoyuan, Li, Yixuan, Wei, Hongxin

Oct-3-2024–arXiv.org Artificial Intelligence

Confidence calibration is critical for the safe deployment of machine learning models in the real world. However, such issue in vision-language models like CLIP, particularly after fine-tuning, has not been fully addressed. In this work, we demonstrate that existing prompt tuning methods usually lead to a trade-off of calibration between base and new classes: the cross-entropy loss in CoOp causes overconfidence in new classes by increasing textual label divergence, whereas the regularization of KgCoOp maintains the confidence level but results in underconfidence in base classes due to the improved accuracy. Inspired by the observations, we introduce Dynamic Outlier Regularization (DOR) to ensure the confidence calibration on both base and new classes after fine-tuning. In particular, we propose to minimize the feature deviation of novel textual labels (instead of base classes) sampled from a large vocabulary. In effect, DOR prevents the increase in textual divergence for new labels while easing restrictions on base classes. Extensive experiments demonstrate that DOR can enhance the calibration performance of current fine-tuning methods on base and new classes. Large pre-trained vision-language models (VLMs) like CLIP (Radford et al., 2021) have become the de facto standard in today's zero-shot tasks including image recognition (Wortsman et al., 2022), open-vocabulary segmentation (Liang et al., 2023) and knowledge-augmented retrieval (Ming & Li, 2024). To transfer pre-trained CLIP knowledge to domain-specific downstream tasks efficiently, various parameter-efficient fine-tuning (PEFT) techniques including prompt tuning (Zhou et al., 2022b) and adapter (Gao et al., 2024) have been proposed. Despite the promising improvement in accuracy, the reliability issue such as confidence calibration in fine-tuned VLMs has been largely overlooked. Without fully understanding the miscalibration in fine-tuned VLMs, it can exacerbate safety concerns in high-stakes applications like medical diagnosis and autonomous driving.

base class, calibration, new class, (15 more...)

arXiv.org Artificial Intelligence

Oct-3-2024

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.04)
- North America
  - United States > Wisconsin
    - Dane County > Madison (0.04)
  - Canada > Newfoundland and Labrador
    - Labrador (0.04)

Genre:
- Research Report (0.50)

Industry:
- Automobiles & Trucks (1.00)
- Aerospace & Defense > Aircraft (0.93)
- Transportation
  - Passenger (1.00)
  - Air (1.00)
  - Ground > Road (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (0.51)
  - Machine Learning
    - Neural Networks (0.46)
    - Pattern Recognition (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found