The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Jun-14-2026, 11:52:13 GMT–Neural Information Processing Systems

The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise. However, recent defense mechanisms against these attacks have reached near-saturation performance on benchmark evaluations, often with minimal effort. This dual high performance in both attack and defense gives rise to a fundamental and perplexing paradox. To gain a deep understanding of this issue and thus further help strengthen the trustworthiness of VLLMs, this paper makes three key contributions: i) One tentative explanation for VLLMs being prone to jailbreak attacks-inclusion of vision inputs, as well as its in-depth analysis.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-14-2026, 11:52:13 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.67)
- North America > United States (0.46)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found