MIP against Agent: Malicious Image Patches Hijacking Multimodal OSAgents
–Neural Information Processing Systems
Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable capabilities, driving significant advancements across a wide range of applications. These models are typically fine-tuned to align with specific objectives, such as being "helpful and harmless" [39]. However, recent work on adversarial attacks has demonstrated that carefully crafted inputs can bypass these alignment safeguards [65, 10, 4, 26, 52]. While such adversarial attacks can elicit harmful responses, the output is usually constrained to text that is not directly actionable, limiting the scope of possible harm. While malicious text outputs are concerning, it remains unclear whether the associated risks exceed those posed by information already accessible through the internet [18].
Neural Information Processing Systems
Jun-15-2026, 08:25:41 GMT
- Country:
- Europe
- United Kingdom (0.45)
- Austria (0.28)
- Europe
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.92)
- Research Report
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: