Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks

Gosmar, Diego, Dahl, Deborah A., Gosmar, Dario

Mar-14-2025–arXiv.org Artificial Intelligence

Recent advances in generative AI have enabled increasingly sophisticated applications in various domains, from customer service chatbots to automated content generation. However, alongside these advancements, the vulnerability of large language models (LLMs) to adversarial inputs has emerged as a critical concern. Among these, prompt injection attacks pose a particularly insidious challenge, as they exploit the model's inherent instruction-following behavior to override intended constraints. While prompt injection is often discussed in theoretical contexts, its impact on deployed AI systems has been observed in practical settings. Research has demonstrated that even models with reinforced safety mechanisms--or with specific Knowledge based on RAG (Retrieval Augmented Generation)--can be manipulated into disclosing sensitive data, executing unauthorized instructions, or producing harmful content [4].

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-14-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Europe
  - Switzerland > Basel-City
    - Basel (0.04)
  - Italy > Piedmont
    - Turin Province > Turin (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found