Understanding the Vulnerability of CLIP to Image Compression

Chen, Cangxiong, Namboodiri, Vinay P., Padget, Julian

Nov-23-2023–arXiv.org Artificial Intelligence

CLIP is a widely used foundational vision-language model that is used for zero-shot image recognition and other image-text alignment tasks. We demonstrate that CLIP is vulnerable to change in image quality under compression. This surprising result is further analysed using an attribution method-Integrated Gradients. Using this attribution method, we are able to better understand both quantitatively and qualitatively exactly the nature in which the compression affects the zero-shot recognition accuracy of this model. We evaluate this extensively on CIFAR-10 and STL-10. Our work provides the basis to understand this vulnerability of CLIP and can help us develop more effective methods to improve the robustness of CLIP and other vision-language models.

airplane, quality 25, quality 50, (15 more...)

arXiv.org Artificial Intelligence

Nov-23-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Florida > Broward County
      - Fort Lauderdale (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.14)

Genre:
- Research Report (0.50)
- Overview (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)