Goto

Collaborating Authors

 Agent Societies


Multi-Agent Trust Region Policy Optimisation: A Joint Constraint Approach

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) requires coordinated and stable policy updates among interacting agents. Heterogeneous-Agent Trust Region Policy Optimization (HA TRPO) enforces per-agent trust region constraints using Kullback-Leibler (KL) divergence to stabilize training. However, assigning each agent the same KL threshold can lead to slow and locally optimal updates, especially in heterogeneous settings. To address this limitation, we propose two approaches for allocating the KL divergence threshold across agents: HA TRPO-W, a Karush-Kuhn-Tucker-based (KKT -based) method that optimizes threshold assignment under global KL constraints, and HA TRPO-G, a greedy algorithm that prioritizes agents based on improvement-to-divergence ratio. By connecting sequential policy optimization with constrained threshold scheduling, our approach enables more flexible and effective learning in heterogeneous-agent settings. Experimental results demonstrate that our methods significantly boost the performance of HA TRPO, achieving faster convergence and higher final rewards across diverse MARL benchmarks. Specifically, HA TRPO-W and HA TRPO-G achieve comparable improvements in final performance, each exceeding 22.5%. Notably, HA TRPO-W also demonstrates more stable learning dynamics, as reflected by its lower variance.


FPT-Approximability of Stable Matching Problems

arXiv.org Artificial Intelligence

We study parameterized approximability of three optimization problems related to stable matching: (1) Min-BP-SMI: Given a stable marriage instance and a number k, find a size-at-least-k matching that minimizes the number $ฮฒ$ of blocking pairs; (2) Min-BP-SRI: Given a stable roommates instance, find a matching that minimizes the number $ฮฒ$ of blocking pairs; (3) Max-SMTI: Given a stable marriage instance with preferences containing ties, find a maximum-size stable matching. The first two problems are known to be NP-hard to approximate to any constant factor and W[1]-hard with respect to $ฮฒ$, making the existence of an EPTAS or FPT-algorithms unlikely. We show that they are W[1]-hard with respect to $ฮฒ$ to approximate to any function of $ฮฒ$. This means that unless FPT=W[1], there is no FPT-approximation scheme for the parameter $ฮฒ$. The last problem (Max-SMTI) is known to be NP-hard to approximate to factor-29/33 and W[1]-hard with respect to the number of ties. We complement this and present an FPT-approximation scheme for the parameter "number of agents with ties".


Securing Agentic AI: Threat Modeling and Risk Analysis for Network Monitoring Agentic AI System

arXiv.org Artificial Intelligence

When combining Large Language Models (LLMs) with autonomous agents, used in network monitoring and decision-making systems, this will create serious security issues. In this research, the MAESTRO framework consisting of the seven layers threat modeling architecture in the system was used to expose, evaluate, and eliminate vulnerabilities of agentic AI. The prototype agent system was constructed and implemented, using Python, LangChain, and telemetry in WebSockets, and deployed with inference, memory, parameter tuning, and anomaly detection modules. Two practical threat cases were confirmed as follows: (i) resource denial of service by traffic replay denial-of-service, and (ii) memory poisoning by tampering with the historical log file maintained by the agent. These situations resulted in measurable levels of performance degradation, i.e. telemetry updates were delayed, and computational loads were increased, as a result of poor system adaptations. It was suggested to use a multilayered defense-in-depth approach with memory isolation, validation of planners and anomaly response systems in real-time. These findings verify that MAESTRO is viable in operational threat mapping, prospective risk scoring, and the basis of the resilient system design. The authors bring attention to the importance of the enforcement of memory integrity, paying attention to the adaptation logic monitoring, and cross-layer communication protection that guarantee the agentic AI reliability in adversarial settings.


Competitive Algorithms for Multi-Agent Ski-Rental Problems

arXiv.org Artificial Intelligence

This paper introduces a novel multi-agent ski-rental problem that generalizes the classical ski-rental dilemma to a group setting where agents incur individual and shared costs. In our model, each agent can either rent at a fixed daily cost, or purchase a pass at an individual cost, with an additional third option of a discounted group pass available to all. We consider scenarios in which agents' active days differ, leading to dynamic states as agents drop out of the decision process. To address this problem from different perspectives, we define three distinct competitive ratios: overall, state-dependent, and individual rational. For each objective, we design and analyze optimal deterministic and randomized policies. Our deterministic policies employ state-aware threshold functions that adapt to the dynamic states, while our randomized policies sample and resample thresholds from tailored state-aware distributions. The analysis reveals that symmetric policies, in which all agents use the same threshold, outperform asymmetric ones. Our results provide competitive ratio upper and lower bounds and extend classical ski-rental insights to multi-agent settings, highlighting both theoretical and practical implications for group decision-making under uncertainty.



FACMAC: Factored Multi-Agent Centralised Policy Gradients Bei Peng University of Liverpool T abish Rashid University of Oxford Christian A. Schroeder de Witt

Neural Information Processing Systems

However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics.




K-level Reasoning for Zero-Shot Coordination in Hanabi

Neural Information Processing Systems

Work done while at Facebook AI Research 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Figure 1: Visualization of various hierarchical training schemas, including sequential KLR, synchronous KLR, synchronous CH, and our new SyKLRBR for 4 levels.


Extending the OWASP Multi-Agentic System Threat Modeling Guide: Insights from Multi-Agent Security Research

arXiv.org Artificial Intelligence

We propose an extension to the OW ASP Multi-Agentic System (MAS) Threat Modeling Guide, translating recent anticipatory research in multi-agent security (MASEC) into practical guidance for addressing challenges unique to large language model (LLM)-driven multi-agent architectures. Although OW ASP's existing taxonomy covers many attack vectors, our analysis identifies gaps in modeling failures, including, but not limited to: reasoning collapse across planner-executor chains, metric overfitting, unsafe delegation escalation, emergent covert coordination, and heterogeneous multi-agent exploits. We introduce additional threat classes and scenarios grounded in practical MAS deployments, highlighting risks from benign goal drift, cross-agent hallucination propagation, affective prompt framing, and multi-agent backdoors. We also outline evaluation strategies, including robustness testing, coordination assessment, safety enforcement, and emergent behavior monitoring, to ensure complete coverage. This work complements the framework of OW ASP by expanding its applicability to increasingly complex, autonomous, and adaptive multi-agent systems, with the goal of improving security posture and resilience in real world deployments.