Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

Qraitem, Maan, Tasnim, Nazia, Saenko, Kate, Plummer, Bryan A.

Feb-1-2024–arXiv.org Artificial Intelligence

Recently, significant progress has been made on Large Vision-Language Models (LVLMs); a new class of VL models that make use of large pre-trained language models. Yet, their vulnerability to Typographic attacks, which involve superimposing misleading text onto an image remain unstudied. Furthermore, prior work typographic attacks rely on sampling a random misleading class from a predefined set of classes. However, the random chosen class might not be the most effective attack. To address these issues, we first introduce a novel benchmark uniquely designed to test LVLMs vulnerability to typographic attacks. Furthermore, we introduce a new and more effective typographic attack: Self-Generated typographic attacks. Indeed, our method, given an image, make use of the strong language capabilities of models like GPT-4V by simply prompting them to recommend a typographic attack. Using our novel benchmark, we uncover that typographic attacks represent a significant threat against LVLM(s). Furthermore, we uncover that typographic attacks recommended by GPT-4V using our new method are not only more effective against GPT-4V itself compared to prior work attacks, but also against a host of less capable yet popular open source models like LLaVA, InstructBLIP, and MiniGPT4.

language model, lvlm, typographic attack, (11 more...)

arXiv.org Artificial Intelligence

Feb-1-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Ontario > Toronto (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.87)