Vision-Language Models are Strong Noisy Label Detectors

May-27-2025, 04:13:08 GMT–Neural Information Processing Systems

Recent research on fine-tuning vision-language models has demonstrated impressive performance in various downstream tasks. However, the challenge of obtaining accurately labeled data in real-world applications poses a significant obstacle during the fine-tuning process. To address this challenge, this paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models. The proposed framework establishes a noisy label detector by learning positive and negative textual prompts for each class. The positive prompt seeks to reveal distinctive features of the class, while the negative prompt serves as a learnable threshold for separating clean and noisy samples.

artificial intelligence, natural language, strong noisy label detector, (5 more...)

Neural Information Processing Systems

May-27-2025, 04:13:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (0.89)