Towards Visualization-of-Thought Jailbreak Attack against Large Visual Language Models

Jun-17-2026, 21:14:54 GMT–Neural Information Processing Systems

As Visual Language Models (VLMs) continue to evolve, they have demonstrated increasingly sophisticated logical reasoning capabilities and multimodal thought generation, opening doors to widespread applications. However, this advancement raises serious concerns about content security, particularly when these models process complex multimodal inputs requiring intricate reasoning. When faced with these safety challenges, the critical competition between logical reasoning and safety objectives of VLMs is often overlooked in previous works. In this paper, we introduce Visualization-of-Thought Attack (VoTA), a novel and automated attack framework that strategically constructs chains of images with risky visual thoughts to challenge victim models.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-17-2026, 21:14:54 GMT

Conferences PDF

Add feedback

Genre:
- Workflow (1.00)
- Overview (0.67)
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found