A Critical Evaluation of Defenses against Prompt Injection Attacks

Jia, Yuqi, Shao, Zedian, Liu, Yupei, Jia, Jinyuan, Song, Dawn, Gong, Neil Zhenqiang

May-27-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) are vulnerable to prompt injection attacks, and several defenses have recently been proposed, often claiming to mitigate these attacks successfully. However, we argue that existing studies lack a principled approach to evaluating these defenses. In this paper, we argue the need to assess defenses across two critical dimensions: (1) effectiveness, measured against both existing and adaptive prompt injection attacks involving diverse target and injected prompts, and (2) general-purpose utility, ensuring that the defense does not compromise the foundational capabilities of the LLM. Our critical evaluation reveals that prior studies have not followed such a comprehensive evaluation methodology. When assessed using this principled approach, we show that existing defenses are not as successful as previously reported. This work provides a foundation for evaluating future defenses and guiding their development. Our code and data are available at: https://github.com/PIEval123/PIEval.

effectiveness, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

May-27-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.68)

Industry:
- Information Technology > Security & Privacy (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.75)
    - Performance Analysis > Accuracy (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found