AITopics

Multi-agent reinforcement learning (MARL) requires coordinated and stable policy updates among interacting agents. Heterogeneous-Agent Trust Region Policy Optimization (HA TRPO) enforces per-agent trust region constraints using Kullback-Leibler (KL) divergence to stabilize training. However, assigning each agent the same KL threshold can lead to slow and locally optimal updates, especially in heterogeneous settings. To address this limitation, we propose two approaches for allocating the KL divergence threshold across agents: HA TRPO-W, a Karush-Kuhn-Tucker-based (KKT -based) method that optimizes threshold assignment under global KL constraints, and HA TRPO-G, a greedy algorithm that prioritizes agents based on improvement-to-divergence ratio. By connecting sequential policy optimization with constrained threshold scheduling, our approach enables more flexible and effective learning in heterogeneous-agent settings. Experimental results demonstrate that our methods significantly boost the performance of HA TRPO, achieving faster convergence and higher final rewards across diverse MARL benchmarks. Specifically, HA TRPO-W and HA TRPO-G achieve comparable improvements in final performance, each exceeding 22.5%. Notably, HA TRPO-W also demonstrates more stable learning dynamics, as reflected by its lower variance.

agent, artificial intelligence, ha trpo, (16 more...)

2508.1034

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Chen, Jiehua, Roy, Sanjukta, Simola, Sofia

FPT-Approximability of Stable Matching Problems

We study parameterized approximability of three optimization problems related to stable matching: (1) Min-BP-SMI: Given a stable marriage instance and a number k, find a size-at-least-k matching that minimizes the number $β$ of blocking pairs; (2) Min-BP-SRI: Given a stable roommates instance, find a matching that minimizes the number $β$ of blocking pairs; (3) Max-SMTI: Given a stable marriage instance with preferences containing ties, find a maximum-size stable matching. The first two problems are known to be NP-hard to approximate to any constant factor and W[1]-hard with respect to $β$, making the existence of an EPTAS or FPT-algorithms unlikely. We show that they are W[1]-hard with respect to $β$ to approximate to any function of $β$. This means that unless FPT=W[1], there is no FPT-approximation scheme for the parameter $β$. The last problem (Max-SMTI) is known to be NP-hard to approximate to factor-29/33 and W[1]-hard with respect to the number of ties. We complement this and present an FPT-approximation scheme for the parameter "number of agents with ties".

agent, artificial intelligence, matching, (16 more...)

2508.10129

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Zambare, Pallavi, Thanikella, Venkata Nikhil, Liu, Ying

Securing Agentic AI: Threat Modeling and Risk Analysis for Network Monitoring Agentic AI System

When combining Large Language Models (LLMs) with autonomous agents, used in network monitoring and decision-making systems, this will create serious security issues. In this research, the MAESTRO framework consisting of the seven layers threat modeling architecture in the system was used to expose, evaluate, and eliminate vulnerabilities of agentic AI. The prototype agent system was constructed and implemented, using Python, LangChain, and telemetry in WebSockets, and deployed with inference, memory, parameter tuning, and anomaly detection modules. Two practical threat cases were confirmed as follows: (i) resource denial of service by traffic replay denial-of-service, and (ii) memory poisoning by tampering with the historical log file maintained by the agent. These situations resulted in measurable levels of performance degradation, i.e. telemetry updates were delayed, and computational loads were increased, as a result of poor system adaptations. It was suggested to use a multilayered defense-in-depth approach with memory isolation, validation of planners and anomaly response systems in real-time. These findings verify that MAESTRO is viable in operational threat mapping, prospective risk scoring, and the basis of the resilient system design. The authors bring attention to the importance of the enforcement of memory integrity, paying attention to the adaptation logic monitoring, and cross-layer communication protection that guarantee the agentic AI reliability in adversarial settings.

data mining, machine learning, natural language, (18 more...)

2508.10043

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Competitive Algorithms for Multi-Agent Ski-Rental Problems

Wang, Xuchuang, Sun, Bo, Beyhaghi, Hedyeh, Lui, John C. S., Hajiesmaili, Mohammad, Wierman, Adam

This paper introduces a novel multi-agent ski-rental problem that generalizes the classical ski-rental dilemma to a group setting where agents incur individual and shared costs. In our model, each agent can either rent at a fixed daily cost, or purchase a pass at an individual cost, with an additional third option of a discounted group pass available to all. We consider scenarios in which agents' active days differ, leading to dynamic states as agents drop out of the decision process. To address this problem from different perspectives, we define three distinct competitive ratios: overall, state-dependent, and individual rational. For each objective, we design and analyze optimal deterministic and randomized policies. Our deterministic policies employ state-aware threshold functions that adapt to the dynamic states, while our randomized policies sample and resample thresholds from tailored state-aware distributions. The analysis reveals that symmetric policies, in which all agents use the same threshold, outperform asymmetric ones. Our results provide competitive ratio upper and lower bounds and extend classical ski-rental insights to multi-agent settings, highlighting both theoretical and practical implications for group decision-making under uncertainty.

agent, artificial intelligence, competitive ratio, (15 more...)

2507.15727

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (0.67)
Transportation > Electric Vehicle (0.67)
Automobiles & Trucks (0.67)
Energy > Renewable > Solar (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

Neural Information Processing SystemsAug-14-2025, 23:12:52 GMT

57444e14ecd9e2c8f603b4f012ce3811-Paper-Conference.pdf

agent, decentralized shield, shield, (17 more...)

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Neural Information Processing SystemsAug-14-2025, 21:39:32 GMT

FACMAC: Factored Multi-Agent Centralised Policy Gradients Bei Peng University of Liverpool T abish Rashid University of Oxford Christian A. Schroeder de Witt

However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics.

agent, facmac, policy gradient, (12 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.41)
Europe > Switzerland (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsAug-14-2025, 16:08:49 GMT

594ca7adb3277c51a998252e2d4c906e-Paper.pdf

agent, objective, sdp system, (12 more...)

Country:

Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.71)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsAug-14-2025, 07:54:18 GMT

3d17b7f7d52c83ab6e97e2dc0bda2e71-Paper-Conference.pdf

arxiv preprint arxiv, interaction, mfg, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Neural Information Processing SystemsAug-14-2025, 06:33:20 GMT

K-level Reasoning for Zero-Shot Coordination in Hanabi

Work done while at Facebook AI Research 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Figure 1: Visualization of various hierarchical training schemas, including sequential KLR, synchronous KLR, synchronous CH, and our new SyKLRBR for 4 levels.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
(5 more...)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Krawiecka, Klaudia, de Witt, Christian Schroeder

Extending the OWASP Multi-Agentic System Threat Modeling Guide: Insights from Multi-Agent Security Research

arXiv.org Artificial IntelligenceAug-14-2025

We propose an extension to the OW ASP Multi-Agentic System (MAS) Threat Modeling Guide, translating recent anticipatory research in multi-agent security (MASEC) into practical guidance for addressing challenges unique to large language model (LLM)-driven multi-agent architectures. Although OW ASP's existing taxonomy covers many attack vectors, our analysis identifies gaps in modeling failures, including, but not limited to: reasoning collapse across planner-executor chains, metric overfitting, unsafe delegation escalation, emergent covert coordination, and heterogeneous multi-agent exploits. We introduce additional threat classes and scenarios grounded in practical MAS deployments, highlighting risks from benign goal drift, cross-agent hallucination propagation, affective prompt framing, and multi-agent backdoors. We also outline evaluation strategies, including robustness testing, coordination assessment, safety enforcement, and emergent behavior monitoring, to ensure complete coverage. This work complements the framework of OW ASP by expanding its applicability to increasingly complex, autonomous, and adaptive multi-agent systems, with the goal of improving security posture and resilience in real world deployments.

agent, artificial intelligence, coordination, (12 more...)

2508.09815

Country:

North America > United States (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.51)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)