Noise-Tolerant Unsupervised Adapter for Vision-Language Models

Ali, Eman, Guan, Dayan, Lu, Shijian, Elsaddik, Abdulmotaleb

Sep-26-2023–arXiv.org Artificial Intelligence

Recent advances in large-scale vision-language models have achieved very impressive performance in various zero-shot image classification tasks. While prior studies have demonstrated significant improvements by introducing few-shot labelled target samples, they still require labelling of target samples, which greatly degrades their scalability while handling various visual recognition tasks. We design NtUA, a Noise-tolerant Unsupervised Adapter that allows learning superior target models with few-shot unlabelled target samples. NtUA works as a key-value cache that formulates visual features and predicted pseudo-labels of the few-shot unlabelled target samples as key-value pairs. It consists of two complementary designs. The first is adaptive cache formation that combats pseudo-label noises by weighting the key-value pairs according to their prediction confidence. The second is pseudo-label rectification, which corrects both pair values (i.e., pseudo-labels) and Figure 1: Unlike key-value cache from labelled samples in cache weights by leveraging knowledge distillation from supervised method [52, 32], we build weighted key-value large-scale vision language models. Extensive experiments cache from unlabelled samples, where the cache weights show that NtUA achieves superior performance consistently are determined by the confidence of the pseudo-labels predicted across multiple widely adopted benchmarks.

artificial intelligence, image understanding, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Sep-26-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Industry:
- Materials > Chemicals
  - Industrial Gases > Liquified Gas (0.46)
  - Commodity Chemicals > Petrochemicals
    - LNG (0.46)
- Energy > Oil & Gas
  - Midstream (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.88)
  - Vision > Image Understanding (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found