Goto

Collaborating Authors

 Agent Societies


An active learning method for solving competitive multi-agent decision-making and control problems

arXiv.org Artificial Intelligence

We propose a scheme based on active learning to reconstruct private strategies executed by a population of interacting agents and predict an exact outcome of the underlying multi-agent interaction process, here identified as a stationary action profile. We envision a scenario where an external observer, endowed with a learning procedure, can make queries and observe the agents' reactions through private action-reaction mappings, whose collective fixed point corresponds to a stationary profile. By iteratively collecting sensible data and updating parametric estimates of the action-reaction mappings, we establish sufficient conditions to assess the asymptotic properties of the proposed active learning methodology so that, if convergence happens, it can only be towards a stationary action profile. This fact yields two main consequences: i) learning locally-exact surrogates of the action-reaction mappings allows the external observer to succeed in its prediction task, and ii) working with assumptions so general that a stationary profile is not even guaranteed to exist, the established sufficient conditions hence act also as certificates for the existence of such a desirable profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach. The authors are with the IMT School for Advanced Studies Lucca, Piazza San Francesco 19, 55100, Lucca, Italy ({filippo.fabiani,


Scalable Multi-Agent Reinforcement Learning with General Utilities

arXiv.org Artificial Intelligence

Many decision-making problems take a form beyond the classic cumulative reward, such as apprenticeship learning [1], diverse skill discovery [2], pure exploration [3], and state marginal matching [4], among others. Such problems can be abstracted as reinforcement Learning (RL) with general utilities [5, 6], which focus on finding a policy to maximize a nonlinear function of the induced stateaction occupancy measure. It generalizes the standard RL in which the objective is only an inner product between the state-action occupancy measure induced by the policy and a policy-independent reward for each state-action pair. Beyond the single agent RL, consider the multi-agent problem where different agents need to interact to obtain a favorable outcome by finding a decision policy that maximizes the global accumulation of all agent's general utility.


Predator-prey survival pressure is sufficient to evolve swarming behaviors

arXiv.org Artificial Intelligence

The comprehension of how local interactions arise in global collective behavior is of utmost importance in both biological and physical research. Traditional agent-based models often rely on static rules that fail to capture the dynamic strategies of the biological world. Reinforcement learning has been proposed as a solution, but most previous methods adopt handcrafted reward functions that implicitly or explicitly encourage the emergence of swarming behaviors. In this study, we propose a minimal predator-prey coevolution framework based on mixed cooperative-competitive multiagent reinforcement learning, and adopt a reward function that is solely based on the fundamental survival pressure, that is, prey receive a reward of $-1$ if caught by predators while predators receive a reward of $+1$. Surprisingly, our analysis of this approach reveals an unexpectedly rich diversity of emergent behaviors for both prey and predators, including flocking and swirling behaviors for prey, as well as dispersion tactics, confusion, and marginal predation phenomena for predators. Overall, our study provides novel insights into the collective behavior of organisms and highlights the potential applications in swarm robotics.


FoX: Formation-aware exploration in multi-agent reinforcement learning

arXiv.org Artificial Intelligence

Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability issue of the exploration space, we define a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations. Then, we propose a novel formation-aware exploration (FoX) framework that encourages partially observable agents to visit the states in diverse formations by guiding them to be well aware of their current formation solely based on their own observations. Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.


${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC.


Strategic Decision-Making in Multi-Agent Domains: A Weighted Potential Dynamic Game Approach

arXiv.org Artificial Intelligence

In interactive multi-agent settings, decision-making complexity arises from agents' interconnected objectives. Dynamic game theory offers a formal framework for analyzing such intricacies. Yet, solving dynamic games and determining Nash equilibria pose computational challenges due to the need of solving coupled optimal control problems. To address this, our key idea is to leverage potential games, which are games with a potential function that allows for the computation of Nash equilibria by optimizing the potential function. We argue that dynamic potential games, can effectively facilitate interactive decision-making in many multi-agent interactions. We will identify structures in realistic multi-agent interactive scenarios that can be transformed into weighted potential dynamic games. We will show that the open-loop Nash equilibria of the resulting weighted potential dynamic game can be obtained by solving a single optimal control problem. We will demonstrate the effectiveness of the proposed method through various simulation studies, showing close proximity to feedback Nash equilibria and significant improvements in solve time compared to state-of-the-art game solvers.


Tackling the Curse of Dimensionality in Large-scale Multi-agent LTL Task Planning via Poset Product

arXiv.org Artificial Intelligence

Linear Temporal Logic (LTL) formulas have been used to describe complex tasks for multi-agent systems, with both spatial and temporal constraints. However, since the planning complexity grows exponentially with the number of agents and the length of the task formula, existing applications are mostly limited to small artificial cases. To address this issue, a new planning algorithm is proposed for task formulas specified as sc-LTL formulas. It avoids two common bottlenecks in the model-checking-based planning methods, i.e., (i) the direct translation of the complete task formula to the associated B\"uchi automaton; and (ii) the synchronized product between the B\"uchi automaton and the transition models of all agents. In particular, each conjuncted sub-formula is first converted to the associated R-posets as an abstraction of the temporal dependencies among the subtasks. Then, an efficient algorithm is proposed to compute the product of these R-posets, which retains their dependencies and resolves potential conflicts. Furthermore, the proposed approach is applied to dynamic scenes where new tasks are generated online. It is capable of deriving the first valid plan with a polynomial time and memory complexity w.r.t. the system size and the formula length. Our method can plan for task formulas with a length of more than 60 and a system with more than 35 agents, while most existing methods fail at the formula length of 20. The proposed method is validated on large fleets of service robots in both simulation and hardware experiments.


iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm that can predict trajectories and intents in dense and heterogeneous traffic scenarios. Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers' intents solely from their local observations. We model two distinct incentives for agents' strategies: Behavioral Incentive for high-level decision-making based on their driving behavior or personality and Instant Incentive for motion planning for collision avoidance based on the current traffic state. Our approach enables agents to infer their opponents' behavior incentives and integrate this inferred information into their decision-making and motion-planning processes. We perform experiments on two simulation environments, Non-Cooperative Navigation and Heterogeneous Highway. In Heterogeneous Highway, results show that, compared with centralized training decentralized execution (CTDE) MARL baselines such as QMIX and MAPPO, our method yields a 4.3% and 38.4% higher episodic reward in mild and chaotic traffic, with 48.1% higher success rate and 80.6% longer survival time in chaotic traffic. We also compare with a decentralized training decentralized execution (DTDE) baseline IPPO and demonstrate a higher episodic reward of 12.7% and 6.3% in mild traffic and chaotic traffic, 25.3% higher success rate, and 13.7% longer survival time.


Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi

arXiv.org Artificial Intelligence

Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in recent years. ZSC refers to the ability of agents to coordinate zero-shot (without additional interaction experience) with independently trained agents. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing environments. Agents also need to adapt and improve their performance with minimal interaction with other agents. In this work, we show empirically that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods, and they require millions of interaction samples to adapt to these new partners. To investigate this issue, we formally defined a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods. In particular, we created a diverse set of pre-trained agents and defined a new metric called adaptation regret that measures the agent's ability to efficiently adapt and improve its coordination performance when paired with some held-out pool of partners on top of its ZSC performance. After evaluating several SOTA algorithms using our framework, our experiments reveal that naive Independent Q-Learning (IQL) agents in most cases adapt as quickly as the SOTA ZSC algorithm Off-Belief Learning (OBL). This finding raises an interesting research question: How to design MARL algorithms with high ZSC performance and capability of fast adaptation to unseen partners. As a first step, we studied the role of different hyper-parameters and design choices on the adaptability of current MARL algorithms. Our experiments show that two categories of hyper-parameters controlling the training data diversity and optimization process have a significant impact on the adaptability of Hanabi agents.


Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs

arXiv.org Artificial Intelligence

Covering skill (a.k.a., option) discovery has been developed to improve the exploration of RL in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. Given that joint state space grows exponentially with the number of agents in multi-agent systems, existing researches still relying on single-agent skill discovery either become prohibitive or fail to directly discover joint skills that improve the connectivity of the joint state space. In this paper, we propose multi-agent skill discovery which enables the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector using the Laplacian spectrum of individual agents' transition graphs. Further, considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method by estimating eigenfunctions through NN-based representation learning techniques. The evaluation on multi-agent tasks built with simulators like Mujoco, shows that the proposed algorithm can successfully identify multi-agent skills, and significantly outperforms the state-of-the-art.