Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks
Gosmar, Diego, Dahl, Deborah A., Gosmar, Dario
–arXiv.org Artificial Intelligence
Recent advances in generative AI have enabled increasingly sophisticated applications in various domains, from customer service chatbots to automated content generation. However, alongside these advancements, the vulnerability of large language models (LLMs) to adversarial inputs has emerged as a critical concern. Among these, prompt injection attacks pose a particularly insidious challenge, as they exploit the model's inherent instruction-following behavior to override intended constraints. While prompt injection is often discussed in theoretical contexts, its impact on deployed AI systems has been observed in practical settings. Research has demonstrated that even models with reinforced safety mechanisms--or with specific Knowledge based on RAG (Retrieval Augmented Generation)--can be manipulated into disclosing sensitive data, executing unauthorized instructions, or producing harmful content [4].
arXiv.org Artificial Intelligence
Mar-14-2025
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- New York > New York County
- Europe
- Switzerland > Basel-City
- Basel (0.04)
- Italy > Piedmont
- Turin Province > Turin (0.14)
- Switzerland > Basel-City
- North America > United States
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: