AITopics | invoke

Collaborating Authors

invoke

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rn(a) with Rn(a): =nX

Neural Information Processing SystemsFeb-9-2026, 10:13:25 GMT

In particular, Bai et al. [5], Jin et al. [31] developed the first algorithms to beat the curse of multiple agents in twoplayer zero-sum MGs, while Jin et al. [31], Daskalakis et al. [23], Mao and Ba sar [44], Song et al. [63] further demonstrated how to accomplish the same goal when learning other computationally tractable solution concepts (e.g., coarse correlated equilibria) in general-sum multi-player Markov games. We shall also briefly remark on the prior works that concern RL with a generative model. A key term in the regret bound (36) is a weighted sum of the "variance-style" quantities {Varπk(`k)}. While Var(`k) k`kk2 is orderwise tight in the worst-case scenario for a given iteration k, exploiting the problem-specific variance-type structure across time is crucial in sharpening the horizon dependence in many RL problems(e.g.,Azaretal.[3],Jinetal.[30],Lietal.[41,40]). C.1 Preliminariesandnotation Let us start with some preliminary facts and notation.

artificial intelligence, bvi, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Toward Understanding Security Issues in the Model Context Protocol Ecosystem

Li, Xiaofan, Gao, Xing

arXiv.org Artificial IntelligenceOct-21-2025

The Model Context Protocol (MCP) is an emerging open standard that enables AI-powered applications to interact with external tools through structured metadata. A rapidly growing ecosystem has formed around MCP, including a wide range of MCP hosts (i.e., Cursor, Windsurf, Claude Desktop, and Cline), MCP registries (i.e., mcp.so, MCP Market, MCP Store, Pulse MCP, Smithery, and npm), and thousands of community-contributed MCP servers. Although the MCP ecosystem is gaining traction, there has been little systematic study of its architecture and associated security risks. In this paper, we present the first comprehensive security analysis of the MCP ecosystem. We decompose MCP ecosystem into three core components: hosts, registries, and servers, and study the interactions and trust relationships among them. Users search for servers on registries and configure them in the host, which translates LLM-generated output into external tool invocations provided by the servers and executes them. Our qualitative analysis reveals that hosts lack output verification mechanisms for LLM-generated outputs, enabling malicious servers to manipulate model behavior and induce a variety of security threats, including but not limited to sensitive data exfiltration. We uncover a wide range of vulnerabilities that enable attackers to hijack servers, due to the lack of a vetted server submission process in registries. To support our analysis, we collect and analyze a dataset of 67,057 servers from six public registries. Our quantitative analysis demonstrates that a substantial number of servers can be hijacked by attackers. Finally, we propose practical defense strategies for MCP hosts, registries, and users. We responsibly disclosed our findings to affected hosts and registries.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.16558

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Portugal > Coimbra > Coimbra (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems

Allegrini, Edoardo, Shreekumar, Ananth, Celik, Z. Berkay

arXiv.org Artificial IntelligenceOct-17-2025

Agentic AI systems, which leverage multiple autonomous agents and Large Language Models (LLMs), are increasingly used to address complex, multi-step tasks. The safety, security, and functionality of these systems are critical, especially in high-stakes applications. However, the current ecosystem of inter-agent communication is fragmented, with protocols such as the Model Context Protocol (MCP) for tool access and the Agent-to-Agent (A2A) protocol for coordination being analyzed in isolation. This fragmentation creates a semantic gap that prevents the rigorous analysis of system properties and introduces risks such as architectural misalignment and exploitable coordination issues. To address these challenges, we introduce a modeling framework for agentic AI systems composed of two foundational models. The first, the host agent model, formalizes the top-level entity that interacts with the user, decomposes tasks, and orchestrates their execution by leveraging external agents and tools. The second, the task lifecycle model, details the states and transitions of individual sub-tasks from creation to completion, providing a fine-grained view of task management and error handling. Together, these models provide a unified semantic framework for reasoning about the behavior of multi-AI agent systems. Grounded in this framework, we define 17 properties for the host agent and 14 for the task lifecycle, categorized into liveness, safety, completeness, and fairness. Expressed in temporal logic, these properties enable formal verification of system behavior, detection of coordination edge cases, and prevention of deadlocks and security vulnerabilities. Through this effort, we introduce the first rigorously grounded, domain-agnostic framework for the systematic analysis, design, and deployment of correct, reliable, and robust agentic AI systems.

agent, artificial intelligence, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.14133

Country: Asia > China (0.04)

Genre:

Workflow (0.68)
Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

A Other related works

Neural Information Processing SystemsAug-15-2025, 07:38:16 GMT

Let us discuss in passing additional prior works on learning equilibrium solutions in MARL, which have attracted an explosion of interest in recent years. Roughly speaking, previous NE-finding algorithms for two-player zero-sum Markov games can be categorized into model-based algorithms [52, 79, 43], value-based algorithms [4, 5, 73, 54, 31, 15], and policy-based algorithms [10, 22, 71, 82, 14, 81, 11]. In particular, Bai et al. [5], Jin et al. [31] developed the first algorithms to beat the curse of multiple agents in two-player zero-sum MGs, while Jin et al. [31], Daskalakis et al. [23], Mao and Ba sar [44], Song et al. [63] further demonstrated how to accomplish the same goal when learning other computationally tractable solution concepts (e.g., coarse correlated equilibria) in general-sum multi-player Markov games. The recent works Cui and Du [17, 18], Y an et al. [74] studied how to alleviate the sample size scaling with the number of agents in the presence of offline data, with Cui and Du [18] providing a sample-efficient algorithm that also learns NEs in multi-agent Markov games (despite computational intractability). We shall also briefly remark on the prior works that concern RL with a generative model.

inequality, log 2, log 3, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

Add feedback

Repairing Language Model Pipelines by Meta Self-Refining Competing Constraints at Runtime

Eshghie, Mojtaba

arXiv.org Artificial IntelligenceJul-16-2025

Language Model (LM) pipelines can dynamically refine their outputs against programmatic constraints. However, their effectiveness collapses when faced with competing soft constraints, leading to inefficient backtracking loops where satisfying one constraint violates another. We introduce Meta Self-Refining, a framework that equips LM pipelines with a meta-corrective layer to repair these competitions at runtime/inference-time. Our approach monitors the pipeline's execution history to detect oscillatory failures. Upon detection, it invokes a meta-repairer LM that analyzes the holistic state of the backtracking attempts and synthesizes a strategic instruction to balance the competing requirements. This self-repair instruction guides the original LM out of a failing refining loop towards a successful output. Our results show Meta Self-Refining can successfully repair these loops, leading to more efficient LM programs.

constraint, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.1059

Country:

North America > United States (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.51)

Add feedback

URSA: The Universal Research and Scientific Agent

Grosskopf, Michael, Bent, Russell, Somasundaram, Rahul, Michaud, Isaac, Lui, Arthur, Debardeleben, Nathan, Lawrence, Earl

arXiv.org Artificial IntelligenceJul-1-2025

Large language models (LLMs) have moved far beyond their initial form as simple chatbots, now carrying out complex reasoning, planning, writing, coding, and research tasks. These skills overlap significantly with those that human scientists use day-to-day to solve complex problems that drive the cutting edge of research. Using LLMs in "agentic" AI has the potential to revolutionize modern science and remove bottlenecks to progress. In this work, we present URSA, a scientific agent ecosystem for accelerating research tasks. URSA consists of a set of modular agents and tools, including coupling to advanced physics simulation codes, that can be combined to address scientific problems of varied complexity and impact. This work highlights the architecture of URSA, as well as examples that highlight the potential of the system.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.22653

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
Europe > United Kingdom > England (0.04)

Genre:

Workflow (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Energy (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents

Gioacchini, Luca, Siracusano, Giuseppe, Sanvito, Davide, Gashteovski, Kiril, Friede, David, Bifulco, Roberto, Lawrence, Carolin

arXiv.org Artificial IntelligenceApr-9-2024

The advances made by Large Language Models (LLMs) have led to the pursuit of LLM agents that can solve intricate, multi-step reasoning tasks. As with any research pursuit, benchmarking and evaluation are key corner stones to efficient and reliable progress. However, existing benchmarks are often narrow and simply compute overall task success. To face these issues, we propose AgentQuest -- a framework where (i) both benchmarks and metrics are modular and easily extensible through well documented and easy-to-use APIs; (ii) we offer two new evaluation metrics that can reliably track LLM agent progress while solving a task. We exemplify the utility of the metrics on two use cases wherein we identify common failure points and refine the agent architecture to obtain a significant performance increase. Together with the research community, we hope to extend AgentQuest further and therefore we make it available under https://github.com/nec-research/agentquest.

agent, architecture, benchmark, (14 more...)

arXiv.org Artificial Intelligence

2404.06411

Country:

Europe > North Macedonia > Skopje Statistical Region > Skopje Municipality > Skopje (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Load Testing SageMaker Multi-Model Endpoints

#artificialintelligenceFeb-28-2023, 13:56:20 GMT

Productionizing Machine Learning models is a complicated practice. There's a lot of iteration around different model parameters, hardware configurations, traffic patterns that you will have to test to try to finalize a production grade deployment. Load testing is an essential software engineering practice, but also crucial to apply in the MLOps space to see how performant your model is in a real-world setting. How can we load test? A simple yet highly effective framework is the Python package: Locust. Locust can be used in both a vanilla and distributed mode to simulate up to thousands of Transactions Per Second (TPS).

endpoint, multi-model endpoint, sagemaker multi-model endpoint, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Globus Automation Services: Research process automation across the space-time continuum

Chard, Ryan, Pruyne, Jim, McKee, Kurt, Bryan, Josh, Raumann, Brigitte, Ananthakrishnan, Rachana, Chard, Kyle, Foster, Ian

arXiv.org Artificial IntelligenceDec-6-2022

Research process automation -- the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources -- has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of actions, \emph{flows}, and the execution of such flows in heterogeneous research environments. To support flows with broad spatial extent (e.g., from scientific instrument to remote data center) and temporal extent (from seconds to weeks), these Globus automation services feature: 1) cloud hosting for reliable execution of even long-lived flows despite sporadic failures; 2) a simple specification and extensible asynchronous action provider API, for defining and executing a wide variety of actions and flows involving heterogeneous resources; 3) an event-driven execution model for automating execution of flows in response to arbitrary events; and 4) a rich security model enabling authorization delegation mechanisms for secure execution of long-running actions across distributed resources. These services permit researchers to outsource and automate the management of a broad range of research tasks to a reliable, scalable, and secure cloud platform. We present use cases for Globus automation services, describe their design and implementation, present microbenchmark studies, and review experiences applying the services in a range of applications.

data mining, machine learning, programming language, (19 more...)

arXiv.org Artificial Intelligence

2208.09513

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Genre:

Workflow (0.70)
Research Report (0.70)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Energy (0.93)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Information Management (1.00)
Information Technology > Cloud Computing (1.00)
(5 more...)

Add feedback

Bea Stollnitz - Creating batch endpoints in Azure ML

#artificialintelligenceAug-29-2022, 21:03:12 GMT

Suppose you've trained a machine learning model to accomplish some task, and you'd now like to provide that model's inference capabilities as a service. Maybe you're writing an application of your own that will rely on this service, or perhaps you want to make the service available to others. This is the purpose of endpoints -- they provide a simple web-based API for feeding data to your model and getting back inference results. Azure ML currently supports three types of endpoints: batch endpoints, Kubernetes online endpoints, and managed online endpoints. I'm going to focus on batch endpoints in this post, but let me start by explaining how the three types differ. Batch endpoints are designed to handle large requests, working asynchronously and generating results that are held in blob storage.

batch endpoint, endpoint, prediction, (15 more...)

#artificialintelligence

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback