AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents
Li, Yanjie, Cao, Yiming, Wang, Dong, Xiao, Bin
–arXiv.org Artificial Intelligence
Abstract--Multimodal agents built on large vision-language models (L VLMs) are increasingly deployed in open-world settings but remain highly vulnerable to prompt injection, especially through visual inputs. We introduce AgentTypo, a black-box red-teaming framework that mounts adaptive typographic prompt injection by embedding optimized text into webpage images. Our automatic typographic prompt injection (A TPI) algorithm maximizes prompt reconstruction by substituting captioners while minimizing human detectability via a stealth loss, with a Tree-structured Parzen Estimator guiding black-box optimization over text placement, size, and color . T o further enhance attack strength, we develop AgentTypo-pro, a multi-LLM system that iteratively refines injection prompts using evaluation feedback and retrieves successful past examples for continual learning. Effective prompts are abstracted into generalizable strategies and stored in a strategy repository, enabling progressive knowledge accumulation and reuse in future attacks. Experiments on the VW A-Adv benchmark across Classifieds, Shopping, and Reddit scenarios show that AgentTypo significantly outperforms the latest image-based attacks such as AgentAttack. On GPT -4o agents, our image-only attack raises the success rate from 23% to 45%, with consistent results across GPT -4V, GPT -4o-mini, Gemini 1.5 Pro, and Claude 3 Opus. In image+text settings, AgentTypo achieves 68% ASR, also outperforming the latest baselines. Our findings reveal that AgentTypo poses a practical and potent threat to multimodal agents and highlight the urgent need for effective defense. As the reasoning capabilities of large vision language models (L VLMs) [1]-[5] continue to advance, increasingly powerful agents have been constructed based on these models [6]-[12]. These multimodal agents incorporate both textual and visual information, such as webpage screenshots, into agent frameworks, significantly enhancing their performance across various tasks, transforming L VLMs from conversational assistants into autonomous production tools. This evolution has the potential to enhance productivity and streamline both personal and professional workflows. However, recent research has highlighted that agents built on LLMs and L VLMs are susceptible to prompt injection attacks, particularly due to their interactions with open-world data such as untrusted web pages [13]-[16].
arXiv.org Artificial Intelligence
Oct-7-2025
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Transportation > Air (0.82)
- Technology: