Vision-Language Models are Strong Noisy Label Detectors
–Neural Information Processing Systems
Recent research on fine-tuning vision-language models has demonstrated impressive performance in various downstream tasks. However, the challenge of obtaining accurately labeled data in real-world applications poses a significant obstacle during the fine-tuning process. To address this challenge, this paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models. The proposed framework establishes a noisy label detector by learning positive and negative textual prompts for each class. The positive prompt seeks to reveal distinctive features of the class, while the negative prompt serves as a learnable threshold for separating clean and noisy samples.
Neural Information Processing Systems
May-27-2025, 04:13:08 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language (0.89)
- Vision (1.00)
- Information Technology > Artificial Intelligence