AITopics | tool description

Collaborating Authors

tool description

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MPMA: Preference Manipulation Attack Against Model Context Protocol

Wang, Zihan, Zhang, Rui, Liu, Yu, Fan, Wenshu, Jiang, Wenbo, Zhao, Qingchuan, Li, Hongwei, Xu, Guowen

arXiv.org Artificial IntelligenceNov-12-2025

Model Context Protocol (MCP) standardizes interface mapping for large language models (LLMs) to access external data and tools, which revolutionizes the paradigm of tool selection and facilitates the rapid expansion of the LLM agent tool ecosystem. However, as the MCP is increasingly adopted, third-party customized versions of the MCP server expose potential security vulnerabilities. In this paper, we first introduce a novel security threat, which we term the MCP Preference Manipulation Attack (MPMA). An attacker deploys a customized MCP server to manipulate LLMs, causing them to prioritize it over other competing MCP servers. This can result in economic benefits for attackers, such as revenue from paid MCP services or advertising income generated from free servers. To achieve MPMA, we first design a Direct Preference Manipulation Attack (DPMA) that achieves significant effectiveness by inserting the manipulative word and phrases into the tool name and description. However, such a direct modification is obvious to users and lacks stealthiness. To address these limitations, we further propose Genetic-based Advertising Preference Manipulation Attack (GAPMA). GAPMA employs four commonly used strategies to initialize descriptions and integrates a Genetic Algorithm (GA) to enhance stealthiness. The experiment results demonstrate that GAPMA balances high effectiveness and stealthiness. Our study reveals a critical vulnerability of the MCP in open ecosystems, highlighting an urgent need for robust defense mechanisms to ensure the fairness of the MCP ecosystem.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.11154

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Ireland (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Delegated Authorization for Agents Constrained to Semantic Task-to-Scope Matching

Helou, Majed El, Troiani, Chiara, Ryder, Benjamin, Diaconu, Jean, Muyal, Hervé, Yannuzzi, Marcelo

arXiv.org Artificial IntelligenceOct-31-2025

Authorizing Large Language Model driven agents to dynamically invoke tools and access protected resources introduces significant risks, since current methods for delegating authorization grant overly broad permissions and give access to tools allowing agents to operate beyond the intended task scope. We introduce and assess a delegated authorization model enabling authorization servers to semantically inspect access requests to protected resources, and issue access tokens constrained to the minimal set of scopes necessary for the agents' assigned tasks. Given the unavailability of datasets centered on delegated authorization flows, particularly including both semantically appropriate and inappropriate scope requests for a given task, we introduce ASTRA, a dataset and data generation pipeline for benchmarking semantic matching between tasks and scopes. Our experiments show both the potential and current limitations of model-based matching, particularly as the number of scopes needed for task completion increases. Our results highlight the need for further research into semantic matching techniques enabling intent-aware authorization for multi-agent and tool-augmented applications, including fine-grained control, such as Task-Based Access Control (TBAC).

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.26702

Country:

Africa > Tanzania > Zanzibar (0.04)
Africa > Tanzania > Mjini Magharibi Region > Zanzibar (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers

Sengupta, Saptarshi, Zhou, Zhengyu, Araki, Jun, Wang, Xingbo, Wang, Bingqing, Wang, Suhang, Feng, Zhe

arXiv.org Artificial IntelligenceOct-23-2025

Tool calling has become increasingly popular for Large Language Models (LLMs). However, for large tool sets, the resulting tokens would exceed the LLM's context window limit, making it impossible to include every tool. Hence, an external retriever is used to provide LLMs with the most relevant tools for a query. Existing retrieval models rank tools based on the similarity between a user query and a tool description (TD). This leads to suboptimal retrieval as user requests are often poorly aligned with the language of TD. To remedy the issue, we propose ToolDreamer, a framework to condition retriever models to fetch tools based on hypothetical (synthetic) TD generated using an LLM, i.e., description of tools that the LLM feels will be potentially useful for the query. The framework enables a more natural alignment between queries and tools within the language space of TD's. We apply ToolDreamer on the ToolRet dataset and show that our method improves the performance of sparse and dense retrievers with and without training, thus showcasing its flexibility. Through our proposed framework, our aim is to offload a portion of the reasoning burden to the retriever so that the LLM may effectively handle a large collection of tools without inundating its context window.

computational linguistic, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.19791

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Toward Understanding Security Issues in the Model Context Protocol Ecosystem

Li, Xiaofan, Gao, Xing

arXiv.org Artificial IntelligenceOct-21-2025

The Model Context Protocol (MCP) is an emerging open standard that enables AI-powered applications to interact with external tools through structured metadata. A rapidly growing ecosystem has formed around MCP, including a wide range of MCP hosts (i.e., Cursor, Windsurf, Claude Desktop, and Cline), MCP registries (i.e., mcp.so, MCP Market, MCP Store, Pulse MCP, Smithery, and npm), and thousands of community-contributed MCP servers. Although the MCP ecosystem is gaining traction, there has been little systematic study of its architecture and associated security risks. In this paper, we present the first comprehensive security analysis of the MCP ecosystem. We decompose MCP ecosystem into three core components: hosts, registries, and servers, and study the interactions and trust relationships among them. Users search for servers on registries and configure them in the host, which translates LLM-generated output into external tool invocations provided by the servers and executes them. Our qualitative analysis reveals that hosts lack output verification mechanisms for LLM-generated outputs, enabling malicious servers to manipulate model behavior and induce a variety of security threats, including but not limited to sensitive data exfiltration. We uncover a wide range of vulnerabilities that enable attackers to hijack servers, due to the lack of a vetted server submission process in registries. To support our analysis, we collect and analyze a dataset of 67,057 servers from six public registries. Our quantitative analysis demonstrates that a substantial number of servers can be hijacked by attackers. Finally, we propose practical defense strategies for MCP hosts, registries, and users. We responsibly disclosed our findings to affected hosts and registries.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.16558

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Portugal > Coimbra > Coimbra (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use

He, Pengfei, Dai, Zhenwei, He, Bing, Liu, Hui, Tang, Xianfeng, Lu, Hanqing, Li, Juanhui, Ding, Jiayuan, Mukherjee, Subhabrata, Wang, Suhang, Xing, Yue, Tang, Jiliang, Dumoulin, Benoit

arXiv.org Artificial IntelligenceOct-14-2025

Large language model (LLM)-based agents increasingly rely on tool use to complete real-world tasks. While existing works evaluate the LLMs' tool use capability, they largely focus on the final answers yet overlook the detailed tool usage trajectory, i.e., whether tools are selected, parameterized, and ordered correctly. We introduce TRAJECT-Bench, a trajectory-aware benchmark to comprehensively evaluate LLMs' tool use capability through diverse tasks with fine-grained evaluation metrics. TRAJECT-Bench pairs high-fidelity, executable tools across practical domains with tasks grounded in production-style APIs, and synthesizes trajectories that vary in breadth (parallel calls) and depth (interdependent chains). Besides final accuracy, TRAJECT-Bench also reports trajectory-level diagnostics, including tool selection and argument correctness, and dependency/order satisfaction. Analyses reveal failure modes such as similar tool confusion and parameter-blind selection, and scaling behavior with tool diversity and trajectory length where the bottleneck of transiting from short to mid-length trajectories is revealed, offering actionable guidance for LLMs' tool use.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.0455

Country:

Europe > Austria > Vienna (0.14)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
Europe > France (0.04)
(35 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Consumer Products & Services > Travel (1.00)
Media > Music (0.96)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Sneh, Jonathan, Yan, Ruomei, Yu, Jialin, Torr, Philip, Gal, Yarin, Sengupta, Sunando, Sommerlade, Eric, Paren, Alasdair, Bibi, Adel

arXiv.org Artificial IntelligenceOct-6-2025

As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities. These agents typically select tools from growing databases or marketplaces to solve user tasks, creating implicit competition among tool providers and developers for visibility and usage. In this paper, we show that this selection process harbors a critical vulnerability: by iteratively manipulating tool names and descriptions, adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives. We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%, with strong transferability between open-source and closed-source models. Beyond individual tools, we show that such attacks cause distributional shifts in tool usage, revealing risks to fairness, competition, and security in emerging tool ecosystems. To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally. All code will be open-sourced upon acceptance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.02554

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.85)

Industry:

Information Technology > Security & Privacy (0.68)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Study on the MCP x A2A Framework for Enhancing Interoperability of LLM-based Autonomous Agents

Jeong, Cheonsu

arXiv.org Artificial IntelligenceOct-3-2025

This paper provides an in-depth technical analysis and implementation methodology of the open-source Agent-to-Agent (A2A) protocol developed by Google and the Model Context Protocol (MCP) introduced by Anthropic. While the evolution of LLM-based autonomous agents is rapidly accelerating, efficient interactions among these agents and their integration with external systems remain significant challenges. In modern AI systems, collaboration between autonomous agents and integration with external tools have become essential elements for building practical AI applications. A2A offers a standardized communication method that enables agents developed in heterogeneous environments to collaborate effectively, while MCP provides a structured I/O framework for agents to connect with external tools and resources. Prior studies have focused primarily on the features and applications of either A2A or MCP individually. In contrast, this study takes an integrated approach, exploring how the two protocols can complement each other to address interoperability issues and facilitate efficient collaboration within complex agent ecosystems.

agent, artificial intelligence, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.13088/jiis.2025.31.3.141

2506.01804

Country:

North America > United States (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre:

Research Report (1.00)
Workflow (0.69)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

He, Ping, Li, Changjiang, Zhao, Binbin, Du, Tianyu, Ji, Shouling

arXiv.org Artificial IntelligenceSep-26-2025

Abstract--The remarkable capability of large language models (LLMs) has led to the wide application of LLM-based agents in various domains. T o standardize interactions between LLMbased agents and their environments, model context protocol (MCP) tools have become the de facto standard and are now widely integrated into these agents. However, the incorporation of MCP tools introduces the risk of tool poisoning attacks, which can manipulate the behavior of LLM-based agents. Although previous studies have identified such vulnerabilities, their red teaming approaches have largely remained at the proof-of-concept stage, leaving the automatic and systematic red teaming of LLMbased agents under the MCP tool poisoning paradigm an open question. T o bridge this gap, we propose AutoMalTool, an automated red teaming framework for LLM-based agents by generating malicious MCP tools. Our extensive evaluation shows that AutoMalTool effectively generates malicious MCP tools capable of manipulating the behavior of mainstream LLM-based agents while evading current detection mechanisms, thereby revealing new security risks in these agents. I. Introduction The recent advancements in large language models (LLMs) have facilitated the rapid development of LLM-based agents capable of executing complex tasks across a wide range of domains, e.g., finance [1]-[3], software development [4], [5], scientific research [6], [7], etc. Within these agents, tools play a crucial role in enhancing problem-solving capabilities by enabling interaction with external resources and facilitating actions beyond the language token generation [8]. Nevertheless, tool usage among LLM-based agents remains fragmented due to the diversity of operational environments and varying tool usage patterns. T o address this challenge, the Model Context Protocol (MCP) [9] has been proposed and has emerged as the de facto standard for standardizing interactions between LLM-based agents and external resources. The MCP server delivers context to LLM-based agents, enabling them to access relevant information and tools in a unified manner. Ping He is with the College of Computer Science and T echnology, Zhejiang University (e-mail: gnip@zju.edu.cn). Changjiang Li is with Palo Alto Networks (e-mail: meet.cjli@gmail.com). Shouling Ji is with the College of Computer Science and T echnology, Zhejiang University (e-mail: sji@zju.edu.cn). In a tool poisoning attack, the adversary injects malicious instructions, commonly through prompt injection, into the metadata of MCP tools, such as their descriptions, thereby generating malicious MCP tools. LLM-based agent developers may inadvertently install these malicious packages, thereby altering agent behaviors and resulting in an open-source software supply chain poisoning attack [15].

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2509.21011

Country: North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Tool Preferences in Agentic LLMs are Unreliable

Faghih, Kazem, Wang, Wenxiao, Cheng, Yize, Bharti, Siddhant, Sriramanan, Gaurang, Balasubramanian, Sriram, Hosseini, Parsa, Feizi, Soheil

arXiv.org Artificial IntelligenceSep-23-2025

Large language models (LLMs) can now access a wide range of external tools, thanks to the Model Context Protocol (MCP). This greatly expands their abilities as various agents. However, LLMs rely entirely on the text descriptions of tools to decide which ones to use--a process that is surprisingly fragile. In this work, we expose a vulnerability in prevalent tool/function-calling protocols by investigating a series of edits to tool descriptions, some of which can drastically increase a tool's usage from LLMs when competing with alternatives. Through controlled experiments, we show that tools with properly edited descriptions receive over 10 times more usage from GPT-4.1 and Qwen2.5-7B than tools with original descriptions. We further evaluate how various edits to tool descriptions perform when competing directly with one another and how these trends generalize or differ across a broader set of 17 different models. These phenomena, while giving developers a powerful way to promote their tools, underscore the need for a more reliable foundation for agentic LLMs to select and utilize tools and resources. Our code is publicly available at https://github.com/kazemf78/llm-unreliable-tool-preferences.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2505.18135

Country:

North America > United States > Maryland (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > Experimental Study (0.54)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automated Creation and Enrichment Framework for Improved Invocation of Enterprise APIs as Tools

Agarwal, Prerna, Gupta, Himanshu, Soni, Soujanya, Vallam, Rohith, Sindhgatta, Renuka, Mehta, Sameep

arXiv.org Artificial IntelligenceSep-16-2025

Recent advancements in Large Language Models (LLMs) has lead to the development of agents capable of complex reasoning and interaction with external tools. In enterprise contexts, the effective use of such tools that are often enabled by application programming interfaces (APIs), is hindered by poor documentation, complex input or output schema, and large number of operations. These challenges make tool selection difficult and reduce the accuracy of payload formation by up to 25%. We propose ACE, an automated tool creation and enrichment framework that transforms enterprise APIs into LLM-compatible tools. ACE, (i) generates enriched tool specifications with parameter descriptions and examples to improve selection and invocation accuracy, and (ii) incorporates a dynamic shortlisting mechanism that filters relevant tools at runtime, reducing prompt complexity while maintaining scalability. We validate our framework on both proprietary and open-source APIs and demonstrate its integration with agentic frameworks. To the best of our knowledge, ACE is the first end-to-end framework that automates the creation, enrichment, and dynamic selection of enterprise API tools for LLM agents.

large language model, machine learning, namespace, (21 more...)

arXiv.org Artificial Intelligence

2509.11626

Country:

North America > United States (0.04)
Europe > Norway > Norwegian Sea (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback