Securing AI Agents Against Prompt Injection Attacks

Ramakrishnan, Badrinath, Balaji, Akshaya

Nov-21-2025–arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) systems have become widely used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection attacks. We present a comprehensive benchmark for evaluating prompt injection risks in RAG-enabled AI agents and propose a multi-layered defense framework. Our benchmark includes 847 adversarial test cases across five attack categories: direct injection, context manipulation, instruction override, data exfiltration, and cross-context contamination. We evaluate three defense mechanisms: content filtering with embedding-based anomaly detection, hierarchical system prompt guardrails, and multi-stage response verification, across seven state-of-the-art language models. Our combined framework reduces successful attack rates from 73.2% to 8.7% while maintaining 94.3% of baseline task performance. We release our benchmark dataset and defense implementation to support future research in AI agent security.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-21-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.83)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (0.92)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.90)
  - Machine Learning
    - Neural Networks > Deep Learning (0.50)
    - Performance Analysis > Accuracy (0.33)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found