Task Bias in Vision-Language Models

Menon, Sachit, Chandratreya, Ishaan Preetam, Vondrick, Carl

Dec-8-2022–arXiv.org Artificial Intelligence

Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision. We conduct an in-depth exploration of the CLIP model and show that its visual representation is often strongly biased towards solving some tasks more than others. Moreover, which task the representation will be biased towards is unpredictable, with little consistency across images. To resolve this task bias, we show how to learn a visual prompt that guides the representation towards features relevant to their task of interest. Our results show that these visual prompts can be independent of the input image and still effectively provide a conditioning mechanism to steer visual representations towards the desired task.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Dec-8-2022

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.69)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)
    - Natural Language (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found