MIP against Agent: Malicious Image Patches Hijacking Multimodal OSAgents

Jun-15-2026, 08:25:41 GMT–Neural Information Processing Systems

Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable capabilities, driving significant advancements across a wide range of applications. These models are typically fine-tuned to align with specific objectives, such as being "helpful and harmless" [39]. However, recent work on adversarial attacks has demonstrated that carefully crafted inputs can bypass these alignment safeguards [65, 10, 4, 26, 52]. While such adversarial attacks can elicit harmful responses, the output is usually constrained to text that is not directly actionable, limiting the scope of possible harm. While malicious text outputs are concerning, it remains unclear whether the associated risks exceed those posed by information already accessible through the internet [18].

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Jun-15-2026, 08:25:41 GMT

Conferences PDF

Add feedback

Country:
- Europe
  - United Kingdom (0.45)
  - Austria (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found