Goto

Collaborating Authors

 Agent Societies


HARBOR: Exploring Persona Dynamics in Multi-Agent Competition

arXiv.org Artificial Intelligence

We investigate factors contributing to LLM agents' success in competitive multi-agent environments, using auctions as a testbed where agents bid to maximize profit. The agents are equipped with bidding domain knowledge, distinct personas that reflect item preferences, and a memory of auction history. Our work extends the classic auction scenario by creating a realistic environment where multiple agents bid on houses, weighing aspects such as size, location, and budget to secure the most desirable homes at the lowest prices. Particularly, we investigate three key questions: (a) How does a persona influence an agent's behavior in a competitive setting? (b) Can an agent effectively profile its competitors' behavior during auctions? (c) How can persona profiling be leveraged to create an advantage using strategies such as theory of mind? Through a series of experiments, we analyze the behaviors of LLM agents and shed light on new findings. Our testbed, called HARBOR, offers a valuable platform for deepening our understanding of multi-agent workflows in competitive environments.


Mean-Field Bayesian Optimisation

arXiv.org Artificial Intelligence

We address the problem of optimising the average payoff for a large number of cooperating agents, where the payoff function is unknown and treated as a black box. While standard Bayesian Optimisation (BO) methods struggle with the scalability required for high-dimensional input spaces, we demonstrate how leveraging the mean-field assumption on the black-box function can transform BO into an efficient and scalable solution. Specifically, we introduce MF-GP-UCB, a novel efficient algorithm designed to optimise agent payoffs in this setting. Our theoretical analysis establishes a regret bound for MF-GP-UCB that is independent of the number of agents, contrasting sharply with the exponential dependence observed when naive BO methods are applied. We evaluate our algorithm on a diverse set of tasks, including real-world problems, such as optimising the location of public bikes for a bike-sharing programme, distributing taxi fleets, and selecting refuelling ports for maritime vessels. Empirical results demonstrate that MF-GP-UCB significantly outperforms existing benchmarks, offering substantial improvements in performance and scalability, constituting a promising solution for mean-field, black-box optimisation. The code is available at https://github.com/petarsteinberg/MF-BO.


Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information

arXiv.org Artificial Intelligence

Offline Reinforcement Learning (RL) focuses on learning policies solely from a batch of previously collected data. of- fering the potential to leverage such datasets effectively without the need for costly or risky active exploration. While recent advances in Offline Multi-Agent RL (MARL) have shown promise, most existing methods either rely on large datasets jointly collected by all agents or agent-specific datasets collected independently. The former approach ensures strong performance but raises scalability concerns, while the latter emphasizes scalability at the expense of performance guarantees. In this work, we propose a novel scalable routine for both dataset collection and offline learning. Agents first collect diverse datasets coherently with a pre-specified information-sharing network and subsequently learn coherent localized policies without requiring either full observability or falling back to complete decentralization. We theoretically demonstrate that this structured approach allows a multi-agent extension of the seminal Fitted Q-Iteration (FQI) algorithm to globally converge, in high probability, to near-optimal policies. The convergence is subject to error terms that depend on the informativeness of the shared information. Furthermore, we show how this approach allows to bound the inherent error of the supervised-learning phase of FQI with the mutual information between shared and unshared information. Our algorithm, SCAlable Multi-agent FQI (SCAM-FQI), is then evaluated on a distributed decision-making problem. The empirical results align with our theoretical findings, supporting the effectiveness of SCAM-FQI in achieving a balance between scalability and policy performance.


Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering system-wide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL.


A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation

arXiv.org Artificial Intelligence

Coalition structure generation (CSG), i.e. the problem of optimally partitioning a set of agents into coalitions to maximize social welfare, is a fundamental computational problem in multiagent systems. This problem is important for many applications where small run times are necessary, including transportation and disaster response. In this paper, we develop SALDAE, a multiagent path finding algorithm for CSG that operates on a graph of coalition structures. Our algorithm utilizes a variety of heuristics and strategies to perform the search and guide it. It is an anytime algorithm that can handle large problems with hundreds and thousands of agents. We show empirically on nine standard value distributions, including disaster response and electric vehicle allocation benchmarks, that our algorithm enables a rapid finding of high-quality solutions and compares favorably with other state-of-the-art methods.


Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding

arXiv.org Machine Learning

The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in warehouse logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently. Traditional heuristics and optimization-based approaches struggle with scalability, while recent machine learning methods often rely on sequential decision-making, leading to high solution latency and suboptimal agent coordination. In this work, we propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning. While our approach generates a joint distribution over agent actions, allowing for fast decoding and effective picker coordination, our method introduces a sequential action selection to avoid conflicts in the multi-dimensional action space. Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances.


Cooperative Multi-Agent Planning with Adaptive Skill Synthesis

arXiv.org Artificial Intelligence

Despite much progress in training distributed artificial intelligence (AI), building cooperative multi-agent systems with multi-agent reinforcement learning (MARL) faces challenges in sample efficiency, interpretability, and transferability. Unlike traditional learning-based methods that require extensive interaction with the environment, large language models (LLMs) demonstrate remarkable capabilities in zero-shot planning and complex reasoning. However, existing LLM-based approaches heavily rely on text-based observations and struggle with the non-Markovian nature of multi-agent interactions under partial observability. We present COMPASS, a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making. The skill library, bootstrapped from demonstrations, evolves via planner-guided tasks to enable adaptive strategies. COMPASS propagates entity information through multi-hop communication under partial observability. Evaluations on the improved StarCraft Multi-Agent Challenge (SMACv2) demonstrate COMPASS achieves up to 30\% higher win rates than state-of-the-art MARL algorithms in symmetric scenarios.


Game Theory Meets Large Language Models: A Systematic Survey

arXiv.org Artificial Intelligence

Game theory establishes a fundamental framework for analyzing strategic interactions among rational decision-makers. The rapid advancement of large language models (LLMs) has sparked extensive research exploring the intersection of these two fields. Specifically, game-theoretic methods are being applied to evaluate and enhance LLM capabilities, while LLMs themselves are reshaping classic game models. This paper presents a comprehensive survey of the intersection of these fields, exploring a bidirectional relationship from three perspectives: (1) Establishing standardized game-based benchmarks for evaluating LLM behavior; (2) Leveraging game-theoretic methods to improve LLM performance through algorithmic innovations; (3) Characterizing the societal impacts of LLMs through game modeling. Among these three aspects, we also highlight how the equilibrium analysis for traditional game models is impacted by LLMs' advanced language understanding, which in turn extends the study of game theory. Finally, we identify key challenges and future research directions, assessing their feasibility based on the current state of the field. By bridging theoretical rigor with emerging AI capabilities, this survey aims to foster interdisciplinary collaboration and drive progress in this evolving research area.


Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

arXiv.org Artificial Intelligence

Communication is crucial in cooperative multi-agent systems with partial observability, as it enables a better understanding of the environment and improves coordination. In extreme environments such as those underwater or in space, the frequency of communication between agents is often limited [1, 2]. For example, a satellite may not be able to reliably receive and react to messages from other satellites synchronously due to limited onboard power and communication delays. In these scenarios, agents aim to establish a communication protocol that allows them to operate independently while still receiving sufficient information to effectively coordinate with nearby agents. Multi-agent reinforcement learning (MARL) has emerged as a popular approach for addressing cooperative navigation challenges involving multiple agents.


Exact Leader Estimation: A New Approach for Distributed Differentiation

arXiv.org Artificial Intelligence

A novel strategy aimed at cooperatively differentiating a signal among multiple interacting agents is introduced, where none of the agents needs to know which agent is the leader, i.e. the one producing the signal to be differentiated. Every agent communicates only a scalar variable to its neighbors; except for the leader, all agents execute the same algorithm. The proposed strategy can effectively obtain derivatives up to arbitrary $m$-th order in a finite time under the assumption that the $(m+1)$-th derivative is bounded. The strategy borrows some of its structure from the celebrated homogeneous robust exact differentiator by A. Levant, inheriting its exact differentiation capability and robustness to measurement noise. Hence, the proposed strategy can be said to perform robust exact distributed differentiation. In addition, and for the first time in the distributed leader-observer literature, sampled-data communication and bounded measurement noise are considered, and corresponding steady-state worst-case accuracy bounds are derived. The effectiveness of the proposed strategy is verified numerically for second- and fourth-order systems, i.e., for estimating derivatives of up to first and third order, respectively.