Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
Qraitem, Maan, Tasnim, Nazia, Saenko, Kate, Plummer, Bryan A.
–arXiv.org Artificial Intelligence
Recently, significant progress has been made on Large Vision-Language Models (LVLMs); a new class of VL models that make use of large pre-trained language models. Yet, their vulnerability to Typographic attacks, which involve superimposing misleading text onto an image remain unstudied. Furthermore, prior work typographic attacks rely on sampling a random misleading class from a predefined set of classes. However, the random chosen class might not be the most effective attack. To address these issues, we first introduce a novel benchmark uniquely designed to test LVLMs vulnerability to typographic attacks. Furthermore, we introduce a new and more effective typographic attack: Self-Generated typographic attacks. Indeed, our method, given an image, make use of the strong language capabilities of models like GPT-4V by simply prompting them to recommend a typographic attack. Using our novel benchmark, we uncover that typographic attacks represent a significant threat against LVLM(s). Furthermore, we uncover that typographic attacks recommended by GPT-4V using our new method are not only more effective against GPT-4V itself compared to prior work attacks, but also against a host of less capable yet popular open source models like LLaVA, InstructBLIP, and MiniGPT4.
arXiv.org Artificial Intelligence
Feb-1-2024
- Country:
- North America > Canada > Ontario > Toronto (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Information Technology > Security & Privacy (0.35)
- Technology: