PLA: Prompt Learning Attack against Text-to-Image Generative Models

Lyu, Xinqi, Liu, Yihao, Li, Yanjie, Xiao, Bin

Aug-7-2025–arXiv.org Artificial Intelligence

T ext-to-Image (T2I) models have gained widespread adoption across various applications. Despite the success, the potential misuse of T2I models poses significant risks of generating Not-Safe-F or-W ork (NSFW) content. T o investigate the vulnerability of T2I models, this paper delves into adversarial attacks to bypass the safety mechanisms under black-box settings. Most previous methods rely on word substitution to search adversarial prompts. Due to limited search space, this leads to suboptimal performance compared to gradient-based training. However, black-box settings present unique challenges to training gradient-driven attack methods, since there is no access to the internal architecture and parameters of T2I models. T o facilitate the learning of adversarial prompts in black-box settings, we propose a novel prompt learning attack framework ( PLA), where insightful gradient-based training tailored to black-box T2I models is designed by utilizing multimodal similarities. Experiments show that our new method can effectively attack the safety mechanisms of black-box T2I models including prompt filters and post-hoc safety checkers with a high success rate compared to state-of-the-art methods. W arning: This paper may contain offensive model-generated content.

machine learning, natural language, safety mechanism, (17 more...)

arXiv.org Artificial Intelligence

Aug-7-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota (0.28)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found