Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

He, Ping, Li, Changjiang, Zhao, Binbin, Du, Tianyu, Ji, Shouling

Sep-26-2025–arXiv.org Artificial Intelligence

Abstract--The remarkable capability of large language models (LLMs) has led to the wide application of LLM-based agents in various domains. T o standardize interactions between LLMbased agents and their environments, model context protocol (MCP) tools have become the de facto standard and are now widely integrated into these agents. However, the incorporation of MCP tools introduces the risk of tool poisoning attacks, which can manipulate the behavior of LLM-based agents. Although previous studies have identified such vulnerabilities, their red teaming approaches have largely remained at the proof-of-concept stage, leaving the automatic and systematic red teaming of LLMbased agents under the MCP tool poisoning paradigm an open question. T o bridge this gap, we propose AutoMalTool, an automated red teaming framework for LLM-based agents by generating malicious MCP tools. Our extensive evaluation shows that AutoMalTool effectively generates malicious MCP tools capable of manipulating the behavior of mainstream LLM-based agents while evading current detection mechanisms, thereby revealing new security risks in these agents. I. Introduction The recent advancements in large language models (LLMs) have facilitated the rapid development of LLM-based agents capable of executing complex tasks across a wide range of domains, e.g., finance [1]-[3], software development [4], [5], scientific research [6], [7], etc. Within these agents, tools play a crucial role in enhancing problem-solving capabilities by enabling interaction with external resources and facilitating actions beyond the language token generation [8]. Nevertheless, tool usage among LLM-based agents remains fragmented due to the diversity of operational environments and varying tool usage patterns. T o address this challenge, the Model Context Protocol (MCP) [9] has been proposed and has emerged as the de facto standard for standardizing interactions between LLM-based agents and external resources. The MCP server delivers context to LLM-based agents, enabling them to access relevant information and tools in a unified manner. Ping He is with the College of Computer Science and T echnology, Zhejiang University (e-mail: gnip@zju.edu.cn). Changjiang Li is with Palo Alto Networks (e-mail: meet.cjli@gmail.com). Shouling Ji is with the College of Computer Science and T echnology, Zhejiang University (e-mail: sji@zju.edu.cn). In a tool poisoning attack, the adversary injects malicious instructions, commonly through prompt injection, into the metadata of MCP tools, such as their descriptions, thereby generating malicious MCP tools. LLM-based agent developers may inadvertently install these malicious packages, thereby altering agent behaviors and resulting in an open-source software supply chain poisoning attack [15].

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

Sep-26-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found