AITopics

Large Language Models (LLMs) are increasingly used to build autonomous agents that perform complex tasks with external tools, often exposed through APIs in enterprise systems. Direct use of these APIs is difficult due to the complex input schema and verbose responses. Current benchmarks overlook these challenges, leaving a gap in assessing API readiness for agent-driven automation. We present a testing framework that systematically evaluates enterprise APIs when wrapped as Python tools for LLM-based agents. The framework generates data-aware test cases, translates them into natural language instructions, and evaluates whether agents can correctly invoke the tool, handle their inputs, and process its responses. We apply the framework to generate over 2400 test cases across different domains and develop a taxonomy of common errors, including input misinterpretation, output failures, and schema mismatches. We further classify errors to support debugging and tool refinement. Our framework provides a systematic approach to enabling enterprise APIs as reliable tools for agent-based applications.

artificial intelligence, large language model, natural language, (18 more...)

2504.15546

Country: Europe (0.28)

Genre:

Workflow (0.93)
Research Report (0.64)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Shaffer, Benjamin David, Kinch, Brooks, Klobusicky, Joseph, Hsieh, M. Ani, Trask, Nathaniel

Physics-informed sensor coverage through structure preserving machine learning

We present a machine learning framework for adaptive source localization in which agents use a structure-preserving digital twin of a coupled hydrodynamic-transport system for real-time trajectory planning and data assimilation. The twin is constructed with conditional neural Whitney forms (CNWF), coupling the numerical guarantees of finite element exterior calculus (FEEC) with transformer-based operator learning. The resulting model preserves discrete conservation, and adapts in real time to streaming sensor data. It employs a conditional attention mechanism to identify: a reduced Whitney-form basis; reduced integral balance equations; and a source field, each compatible with given sensor measurements. The induced reduced-order environmental model retains the stability and consistency of standard finite-element simulation, yielding a physically realizable, regular mapping from sensor data to the source field. We propose a staggered scheme that alternates between evaluating the digital twin and applying Lloyd's algorithm to guide sensor placement, with analysis providing conditions for monotone improvement of a coverage functional. Using the predicted source field as an importance function within an optimal-recovery scheme, we demonstrate recovery of point sources under continuity assumptions, highlighting the role of regularity as a sufficient condition for localization. Experimental comparisons with physics-agnostic transformer architectures show improved accuracy in complex geometries when physical constraints are enforced, indicating that structure preservation provides an effective inductive bias for source identification.

artificial intelligence, machine learning, sensor, (18 more...)

2509.10363

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Energy (0.68)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications > Networks > Sensor Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Moskovskaya, Elizaveta D., Moscowsky, Anton D.

Robot guide with multi-agent control and automatic scenario generation with LLM

The work describes the development of a hybrid control architecture for an anthropomorphic tour guide robot, combining a multi-agent resource management system with automatic behavior scenario generation based on large language models. The proposed approach aims to overcome the limitations of traditional systems, which rely on manual tuning of behavior scenarios. These limitations include manual configuration, low flexibility, and lack of naturalness in robot behavior. The process of preparing tour scenarios is implemented through a two-stage generation: first, a stylized narrative is created, then non-verbal action tags are integrated into the text. The multi-agent system ensures coordination and conflict resolution during the execution of parallel actions, as well as maintaining default behavior after the completion of main operations, contributing to more natural robot behavior. The results obtained from the trial demonstrate the potential of the proposed approach for automating and scaling social robot control systems.

artificial intelligence, natural language, scenario, (17 more...)

2509.10317

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Petković, Marko, Menkovski, Vlado, Calero, Sofía

Towards Fully Automated Molecular Simulations: Multi-Agent Framework for Simulation Setup and Force Field Extraction

Automated characterization of porous materials has the potential to accelerate materials discovery, but it remains limited by the complexity of simulation setup and force field selection. We propose a multi-agent framework in which LLM-based agents can autonomously understand a characterization task, plan appropriate simulations, assemble relevant force fields, execute them and interpret their results to guide subsequent steps. As a first step toward this vision, we present a multi-agent system for literature-informed force field extraction and automated RASPA simulation setup. Initial evaluations demonstrate high correctness and reproducibility, highlighting this approach's potential to enable fully autonomous, scalable materials characterization.

artificial intelligence, force field, simulation, (14 more...)

2509.1021

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Uccello, Federica, Nadjm-Tehrani, Simin

Investigating Feature Attribution for 5G Network Intrusion Detection

With the rise of fifth-generation (5G) networks in critical applications, it is urgent to move from detection of malicious activity to systems capable of providing a reliable verdict suitable for mitigation. In this regard, understanding and interpreting machine learning (ML) models' security alerts is crucial for enabling actionable incident response orchestration. Explainable Artificial Intelligence (XAI) techniques are expected to enhance trust by providing insights into why alerts are raised. A dominant approach statistically associates feature sets that can be correlated to a given alert. This paper starts by questioning whether such attribution is relevant for future generation communication systems, and investigates its merits in comparison with an approach based on logical explanations. We extensively study two methods, SHAP and VoTE-XAI, by analyzing their interpretations of alerts generated by an XGBoost model in three different use cases with several 5G communication attacks. We identify three metrics for assessing explanations: sparsity, how concise they are; stability, how consistent they are across samples from the same attack type; and efficiency, how fast an explanation is generated. As an example, in a 5G network with 92 features, 6 were deemed important by VoTE-XAI for a Denial of Service (DoS) variant, ICMPFlood, while SHAP identified over 20. More importantly, we found a significant divergence between features selected by SHAP and VoTE-XAI. However, none of the top-ranked features selected by SHAP were missed by VoTE-XAI. When it comes to efficiency of providing interpretations, we found that VoTE-XAI is significantly more responsive, e.g. it provides a single explanation in under 0.002 seconds, in a high-dimensional setting (478 features).

explanation, machine learning, natural language, (19 more...)

2509.10206

Country: Europe (0.93)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.89)
(2 more...)

Tomasev, Nenad, Franklin, Matija, Leibo, Joel Z., Jacobs, Julian, Cunningham, William A., Gabriel, Iason, Osindero, Simon

Virtual Agent Economies

The rapid adoption of autonomous AI agents is giving rise to a new economic layer where agents transact and coordinate at scales and speeds beyond direct human oversight. We propose the "sandbox economy" as a framework for analyzing this emergent system, characterizing it along two key dimensions: its origins (emergent vs. intentional) and its degree of separateness from the established human economy (permeable vs. impermeable). Our current trajectory points toward a spontaneous emergence of a vast and highly permeable AI agent economy, presenting us with opportunities for an unprecedented degree of coordination as well as significant challenges, including systemic economic risk and exacerbated inequality. Here we discuss a number of possible design choices that may lead to safely steerable AI agent markets. In particular, we consider auction mechanisms for fair resource allocation and preference resolution, the design of AI "mission economies" to coordinate around achieving collective goals, and socio-technical infrastructure needed to ensure trust, safety, and accountability. By doing this, we argue for the proactive design of steerable agent markets to ensure the coming technological shift aligns with humanity's long-term collective flourishing.

artificial intelligence, deep learning, machine learning, (17 more...)

2509.10147

Country:

Europe (0.67)
North America > United States > California (0.45)

Genre: Research Report > Promising Solution (0.45)

Industry:

Social Sector (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

XAgents: A Unified Framework for Multi-Agent Cooperation via IF-THEN Rules and Multipolar Task Processing Graph

Yang, Hailong, Gu, Mingxian, Wang, Jianqi, Wang, Guanjin, Deng, Zhaohong

The rapid advancement of Large Language Models (LLMs) has significantly enhanced the capabilities of Multi-Agent Systems (MAS) in supporting humans with complex, real-world tasks. However, MAS still face challenges in effective task planning when handling highly complex tasks with uncertainty, often resulting in misleading or incorrect outputs that hinder task execution. To address this, we propose XAgents, a unified multi-agent cooperative framework built on a multipolar task processing graph and IF-THEN rules. XAgents uses the multipolar task processing graph to enable dynamic task planning and handle task uncertainty. During subtask processing, it integrates domain-specific IF-THEN rules to constrain agent behaviors, while global rules enhance inter-agent collaboration. We evaluate the performance of XAgents across three distinct datasets, demonstrating that it consistently surpasses state-of-the-art single-agent and multi-agent approaches in both knowledge-typed and logic-typed question-answering tasks. The codes for XAgents are available at: https://github.com/AGI-FHBC/XAgents.

artificial intelligence, expert system, xagent, (16 more...)

2509.10054

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Shin, Woong, Souza, Renan, Rosendo, Daniel, Suter, Frédéric, Wang, Feiyi, Balaprakash, Prasanna, da Silva, Rafael Ferreira

The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous Science

Modern scientific discovery increasingly requires coordinating distributed facilities and heterogeneous resources, forcing researchers to act as manual workflow coordinators rather than scientists. Advances in AI leading to AI agents show exciting new opportunities that can accelerate scientific discovery by providing intelligence as a component in the ecosystem. However, it is unclear how this new capability would materialize and integrate in the real world. To address this, we propose a conceptual framework where workflows evolve along two dimensions which are intelligence (from static to intelligent) and composition (from single to swarm) to chart an evolutionary path from current workflow management systems to fully autonomous, distributed scientific laboratories. With these trajectories in mind, we present an architectural blueprint that can help the community take the next steps towards harnessing the opportunities in autonomous science with the potential for 100x discovery acceleration and transformational scientific workflows.

evolutionary algorithm, large language model, machine learning, (20 more...)

2509.09915

Country: North America > United States (1.00)

Genre: Workflow (1.00)

Industry:

Information Technology (1.00)
Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(2 more...)

Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building

Fetsch, Alexandra, Savvateev, Iurii, Romdhane, Racem Ben, Wiedmann, Martin, Dimov, Artemiy, Durkalec, Maciej, Teichmann, Josef, Zinsstag, Jakob, Koutsoumanis, Konstantinos, Rajkovic, Andreja, Mann, Jason, Tonolla, Mauro, Ehling-Schulz, Monika, Filter, Matthias, Johler, Sophia

Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus - building. Study Centre for Land-use related Evaluation procedures, One-Health, German Federal Institute for Risk Assessment, Berlin, Germany; Email: Maciej.Durkalec@bfr.bund.de Faculty of Bioscience Engineering, Department. of Food Technology, Safety and Health, Ghent University, Ghent, Belgium, E - mail: Andreja.Rajkovic@UGent.be Abstract Key global challenges of our times are characterized by complex interdependencies and can only be effectively addressed through an integrated, participatory effort. Conventional risk analysis frameworks often reduce complexity to ensure manageability, crea ting silos that hinder comprehensive solutions. A fundamental shift towards holistic strategies is essential to enable effective negotiations between different sectors and to balance the competing interests of stakeholders. However, achieving this balance is often hindered by limited time, vast amounts of information, and the complexity of integrating diverse perspectives. This study presents an AI - assisted negotiation framework that incorporates large language models (LLMs) and AI - based autonomous agents i nto a negotiation - centered risk analysis workflow. The framework enables stakeholders to simulate negotiations, systematically model dynamics, anticipate compromises, and evaluate solution impacts. By leveraging LLMs' semantic analysis capabilities we coul d mitigate information overload and augment decision - making process under time constraints. Proof - of - concept implementations were conducted in two real - world scenarios: (i) prudent use of a biopesticide, and (ii) targeted wild animal population control. Ou r work demonstrates the potential of AI - assisted negotiation to address the current lack of tools for cross - sectoral engagement.

artificial intelligence, large language model, natural language, (17 more...)

2509.09906

Country:

Europe > Germany > Berlin (0.24)
Europe > Belgium > Flanders > East Flanders > Ghent (0.24)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Workflow (0.90)
Research Report (0.82)

Industry:

Law (1.00)
Health & Medicine > Consumer Health (1.00)
Food & Agriculture > Agriculture (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools

Guo, Zikang, Xu, Benfeng, Zhu, Chiwei, Hong, Wentao, Wang, Xiaorui, Mao, Zhendong

The Model Context Protocol (MCP) is rapidly emerging as a pivotal open standard, designed to enhance agent-tool integration and interoperability, and is positioned to unlock a new era of powerful, interconnected, and genuinely utilitarian agentic AI. However, despite MCP's growing adoption, existing benchmarks often fail to capture real-world agent performance within this new paradigm, leading to a distorted perception of their true operational value and an inability to reliably differentiate proficiencies. To bridge this critical evaluation gap, we introduce MCP-AgentBench -- a comprehensive benchmark specifically engineered to rigorously assess language agent capabilities in MCP-mediated tool interactions. Core contributions of MCP-AgentBench include: the establishment of a robust MCP testbed comprising 33 operational servers with 188 distinct tools; the development of a benchmark featuring 600 systematically designed queries distributed across 6 distinct categories of varying interaction complexity; and the introduction of MCP-Eval, a novel outcome-oriented evaluation methodology prioritizing real-world task success. Through extensive empirical evaluation of leading language agents, we provide foundational insights. MCP-AgentBench aims to equip the research community with a standardized and reliable framework to build, validate, and advance agents capable of fully leveraging MCP's transformative benefits, thereby accelerating progress toward truly capable and interoperable AI systems.

large language model, machine learning, natural language, (20 more...)

2509.09734

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)