Goto

Collaborating Authors

 Agent Societies


Convergence Guarantees for Federated SARSA with Local Training and Heterogeneous Agents

Mangold, Paul, Berthier, Eloïse, Moulines, Eric

arXiv.org Machine Learning

We present a novel theoretical analysis of Federated SARSA (FedSARSA) with linear function approximation and local training. We establish convergence guarantees for FedSARSA in the presence of heterogeneity, both in local transitions and rewards, providing the first sample and communication complexity bounds in this setting. At the core of our analysis is a new, exact multi-step error expansion for single-agent SARSA, which is of independent interest. Our analysis precisely quantifies the impact of heterogeneity, demonstrating the convergence of FedSARSA with multiple local updates. Crucially, we show that FedSARSA achieves linear speed-up with respect to the number of agents, up to higher-order terms due to Markovian sampling. Numerical experiments support our theoretical findings.


Distributionally Robust Markov Games with Average Reward

Roch, Zachary, Wang, Yue

arXiv.org Artificial Intelligence

We study distributionally robust Markov games (DR-MGs) with the average-reward criterion, a framework for multi-agent decision-making under uncertainty over extended horizons. In average reward DR-MGs, agents aim to maximize their worst-case infinite-horizon average reward, to ensure satisfactory performance under environment uncertainties and opponent actions. We first establish a connection between the best-response policies and the optimal policies for the induced single-agent problems. Under a standard irreducible assumption, we derive a correspondence between the optimal policies and the solutions of the robust Bellman equation, and derive the existence of stationary Nash Equilibrium (NE) based on these results. We further study DR-MGs under the weakly communicating setting, where we construct a set-valued map and show its value is a subset of the best-response policies, convex and upper hemi-continuous, and derive the existence of NE. We then explore algorithmic solutions, by first proposing a Robust Nash-Iteration algorithm and providing convergence guarantees under some additional assumptions and a NE computing oracle. We further develop a temporal-difference based algorithm for DR-MGs, and provide convergence guarantees without any additional oracle or assumptions. Finally, we connect average-reward robust NE to discounted ones, showing that the average reward robust NE can be approximated by the discounted ones under a large discount factor. Our studies provide a comprehensive theoretical and algorithmic foundation for decision-making in complex, uncertain, and long-running multi-player environments.


Robustness and Adaptability of Reinforcement Learning based Cooperative Autonomous Driving in Mixed-autonomy Traffic

Valiente, Rodolfo, Toghi, Behrad, Pedarsani, Ramtin, Fallah, Yaser P.

arXiv.org Artificial Intelligence

HE development of autonomous vehicles (A Vs) is on the verge of passing beyond the laboratory and simulation tests and is shifting towards addressing the challenges that limit their practicality in today's society. While there is still need for further technological improvements to enable safe and smooth operation of a single A V, a great deal of research attention is being focused on the emerging challenge of operating multiple A Vs and the co-existence of A Vs and human-driven vehicles (HVs) [1], [2]. A realistic outlook for the adoption of autonomous vehicles on the roads is a mixed-traffic scenario in which human drivers with different driving styles and social preferences share the road with A Vs that are perhaps built by different manufacturers and hence follow different policies [3], [4]. In this work, we seek a solution that can ensure the safety and robustness of A Vs in the presence of human drivers with heterogeneous behavioral traits. Connected & autonomous vehicles (CA Vs) via vehicle-to-vehicle (V2V) communication allow vehicles to directly communicate with their neighbors, creating an extended perception that enables explicit coordination among vehicles to overcome the limitations of an isolated agent [5]-[11]. While planning in a fully A V scenario is relatively easy to achieve, coordination in the presence of HVs is a significantly more challenging task, as the A Vs not only need to react to road objects but also need to consider the behaviors of HVs [3], [4], [12]. We start by identifying the major challenges in the domain of behavior planning and prediction for A Vs in mixed-autonomy traffic.


Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning

Dai, Pengcheng, Wang, Dongming, Yu, Wenwu, Ren, Wei

arXiv.org Artificial Intelligence

This paper studies networked multi-agent reinforcement learning (NMARL) with interdependent rewards and coupled policies. In this setting, each agent's reward depends on its own state-action pair as well as those of its direct neighbors, and each agent's policy is parameterized by its local parameters together with those of its $κ_{p}$-hop neighbors, with $κ_{p}\geq 1$ denoting the coupled radius. The objective of the agents is to collaboratively optimize their policies to maximize the discounted average cumulative reward. To address the challenge of interdependent policies in collaborative optimization, we introduce a novel concept termed the neighbors' averaged $Q$-function and derive a new expression for the coupled policy gradient. Based on these theoretical foundations, we develop a distributed scalable coupled policy (DSCP) algorithm, where each agent relies only on the state-action pairs of its $κ_{p}$-hop neighbors and the rewards of its $(κ_{p}+1)$-hop neighbors. Specially, in the DSCP algorithm, we employ a geometric 2-horizon sampling method that does not require storing a full $Q$-table to obtain an unbiased estimate of the coupled policy gradient. Moreover, each agent interacts exclusively with its direct neighbors to obtain accurate policy parameters, while maintaining local estimates of other agents' parameters to execute its local policy and collect samples for optimization. These estimates and policy parameters are updated via a push-sum protocol, enabling distributed coordination of policy updates across the network. We prove that the joint policy produced by the proposed algorithm converges to a first-order stationary point of the objective function. Finally, the effectiveness of DSCP algorithm is demonstrated through simulations in a robot path planning environment, showing clear improvement over state-of-the-art methods.


On Mobile Ad Hoc Networks for Coverage of Partially Observable Worlds

Meriaux, Edwin, Wen, Shuo, Langevin, Louis-Roy, Precup, Doina, Loría, Antonio, Dudek, Gregory

arXiv.org Artificial Intelligence

This paper addresses the movement and placement of mobile agents to establish a communication network in initially unknown environments. We cast the problem in a computational-geometric framework by relating the coverage problem and line-of-sight constraints to the Cooperative Guard Art Gallery Problem, and introduce its partially observable variant, the Partially Observable Cooperative Guard Art Gallery Problem (POCGAGP). We then present two algorithms that solve POCGAGP: CADENCE, a centralized planner that incrementally selects 270 degree corners at which to deploy agents, and DADENCE, a decentralized scheme that coordinates agents using local information and lightweight messaging. Both approaches operate under partial observability and target simultaneous coverage and connectivity. We evaluate the methods in simulation across 1,500 test cases of varied size and structure, demonstrating consistent success in forming connected networks while covering and exploring unknown space. These results highlight the value of geometric abstractions for communication-driven exploration and show that decentralized policies are competitive with centralized performance while retaining scalability.


Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies

Persson, Mika, Lidman, Jonas, Ljungberg, Jacob, Sandelius, Samuel, Andersson, Adam

arXiv.org Artificial Intelligence

This work presents a conceptual study on the application of Multi-Agent Reinforcement Learning (MARL) for decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for scaling studies for MARL. A robust baseline policy is proposed, which is based on restricting agent motion envelopes and applying Dijkstra's algorithm. Experimental results show that two off-the-shelf MARL algorithms perform competitively with the baseline for a small number of agents, but scalability issues arise as the number of agents increase.


LLM Collaboration With Multi-Agent Reinforcement Learning

Liu, Shuo, Chen, Tianle, Liang, Zeyu, Lyu, Xueguang, Amato, Christopher

arXiv.org Artificial Intelligence

A large amount of work has been done in Multi-Agent Systems (MAS) for modeling and solving problems with multiple interacting agents. However, most LLMs are pretrained independently and not specifically optimized for coordination. For example, existing LLM fine-tuning frameworks rely on individual rewards, which require complex reward designs for each agent to encourage collaboration. To address this challenge, we model LLM collaboration as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. We develop a multi-agent, multi-turn algorithm, Multi-Agent Group Relative Policy Optimization (MAGRPO), to solve it, building on current RL approaches for LLMs as well as MARL techniques. Our experiments on LLM writing and coding collaboration demonstrate that fine-tuning multiple LLMs with MAGRPO enables agents to generate high-quality responses efficiently through effective cooperation. Our approach opens the door to using MARL methods for LLM collaboration and highlights the associated challenges.


Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks

Nguyen, Thai Duong, Nguyen, Ngoc-Tan, Nguyen, Thanh-Dao, Van Huynh, Nguyen, Tran, Dinh-Hieu, Chatzinotas, Symeon

arXiv.org Artificial Intelligence

The deployment of Unmanned Aerial Vehicle (UAV) swarms as dynamic communication relays is critical for next-generation tactical networks. However, operating in contested environments requires solving a complex trade-off, including maximizing system throughput while ensuring collision avoidance and resilience against adversarial jamming. Existing heuristic-based approaches often struggle to find effective solutions due to the dynamic and multi-objective nature of this problem. This paper formulates this challenge as a cooperative Multi-Agent Reinforcement Learning (MARL) problem, solved using the Centralized Training with Decentralized Execution (CTDE) framework. Our approach employs a centralized critic that uses global state information to guide decentralized actors which operate using only local observations. Simulation results show that our proposed framework significantly outperforms heuristic baselines, increasing the total system throughput by approximately 50% while simultaneously achieving a near-zero collision rate. A key finding is that the agents develop an emergent anti-jamming strategy without explicit programming. They learn to intelligently position themselves to balance the trade-off between mitigating interference from jammers and maintaining effective communication links with ground users.


Robust Agents in Open-Ended Worlds

Samvelyan, Mikayel

arXiv.org Artificial Intelligence

The growing prevalence of artificial intelligence (AI) in various applications underscores the need for agents that can successfully navigate and adapt to an ever-changing, open-ended world. A key challenge is ensuring these AI agents are robust, excelling not only in familiar settings observed during training but also effectively generalising to previously unseen and varied scenarios. In this thesis, we harness methodologies from open-endedness and multi-agent learning to train and evaluate robust AI agents capable of generalising to novel environments, out-of-distribution inputs, and interactions with other co-player agents. We begin by introducing MiniHack, a sandbox framework for creating diverse environments through procedural content generation. Based on the game of NetHack, MiniHack enables the construction of new tasks for reinforcement learning (RL) agents with a focus on generalisation. We then present Maestro, a novel approach for generating adversarial curricula that progressively enhance the robustness and generality of RL agents in two-player zero-sum games. We further probe robustness in multi-agent domains, utilising quality-diversity methods to systematically identify vulnerabilities in state-of-the-art, pre-trained RL policies within the complex video game football domain, characterised by intertwined cooperative and competitive dynamics. Finally, we extend our exploration of robustness to the domain of LLMs. Here, our focus is on diagnosing and enhancing the robustness of LLMs against adversarial prompts, employing evolutionary search to generate a diverse range of effective inputs that aim to elicit undesirable outputs from an LLM. This work collectively paves the way for future advancements in AI robustness, enabling the development of agents that not only adapt to an ever-evolving world but also thrive in the face of unforeseen challenges and interactions.


Large Language Models Miss the Multi-Agent Mark

La Malfa, Emanuele, La Malfa, Gabriele, Marro, Samuele, Zhang, Jie M., Black, Elizabeth, Luck, Michael, Torr, Philip, Wooldridge, Michael

arXiv.org Artificial Intelligence

Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.