Agent Societies
Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment
Zhou, Tianze, Zhang, Fubiao, Shao, Kun, Li, Kai, Huang, Wenhan, Luo, Jun, Wang, Weixun, Yang, Yaodong, Mao, Hangyu, Wang, Bin, Li, Dong, Liu, Wulong, Hao, Jianye
Extending transfer learning to cooperative multi-agent reinforcement learning (MARL) has recently received much attention. In contrast to the single-agent setting, the coordination indispensable in cooperative MARL constrains each agent's policy. However, existing transfer methods focus exclusively on agent policy and ignores coordination knowledge. We propose a new architecture that realizes robust coordination knowledge transfer through appropriate decomposition of the overall coordination into several coordination patterns. We use a novel mixing network named level-adaptive QTransformer (LA-QTransformer) to realize agent coordination that considers credit assignment, with appropriate coordination patterns for different agents realized by a novel level-adaptive Transformer (LA-Transformer) dedicated to the transfer of coordination knowledge. In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios. Extensive experiments in StarCraft II micro-management show that LA-QTransformer together with PIT achieves superior performance compared with state-of-the-art baselines.
Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
Li, Jiahui, Kuang, Kun, Wang, Baoxiang, Liu, Furui, Chen, Long, Wu, Fei, Xiao, Jun
Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process is credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent Reinforcement Learning
Wang, Jianhong, Wang, Jinxin, Zhang, Yuan, Gu, Yunjie, Kim, Tae-Kyun
Value factorisation proves to be a very useful technique in multi-agent reinforcement learning (MARL), but the underlying mechanism is not yet fully understood. This paper explores a theoretic basis for value factorisation. We generalise the Shapley value in the coalitional game theory to a Markov convex game (MCG) and use it to guide value factorisation in MARL. We show that the generalised Shapley value possesses several features such as (1) accurate estimation of the maximum global value, (2) fairness in the factorisation of the global value, and (3) being sensitive to dummy agents. The proposed theory yields a new learning algorithm called Sharpley Q-learning (SHAQ), which inherits the important merits of ordinary Q-learning but extends it to MARL. In comparison with prior-arts, SHAQ has a much weaker assumption (MCG) that is more compatible with real-world problems, but has superior explainability and performance in many cases. We demonstrated SHAQ and verified the theoretic claims on Predator-Prey and StarCraft Multi-Agent Challenge (SMAC).
PROPm Allocations of Indivisible Goods to Multiple Agents
Baklanov, Artem, Garimidi, Pranav, Gkatzelis, Vasilis, Schoepflin, Daniel
We study the classic problem of fairly allocating a set of indivisible goods among a group of agents, and focus on the notion of approximate proportionality known as PROPm. Prior work showed that there exists an allocation that satisfies this notion of fairness for instances involving up to five agents, but fell short of proving that this is true in general. We extend this result to show that a PROPm allocation is guaranteed to exist for all instances, independent of the number of agents or goods. Our proof is constructive, providing an algorithm that computes such an allocation and, unlike prior work, the running time of this algorithm is polynomial in both the number of agents and the number of goods.
Cooperative Multi-Agent Path Finding: Beyond Path Planning and Collision Avoidance
Greshler, Nir, Gordon, Ofir, Salzman, Oren, Shimkin, Nahum
We introduce the Cooperative Multi-Agent Path Finding (Co-MAPF) problem, an extension to the classical MAPF problem, where cooperative behavior is incorporated. In this setting, a group of autonomous agents operate in a shared environment and have to complete cooperative tasks while avoiding collisions with the other agents in the group. This extension naturally models many real-world applications, where groups of agents are required to collaborate in order to complete a given task. To this end, we formalize the Co-MAPF problem and introduce Cooperative Conflict-Based Search (Co-CBS), a CBS-based algorithm for solving the problem optimally for a wide set of Co-MAPF problems. Co-CBS uses a cooperation-planning module integrated into CBS such that cooperation planning is decoupled from path planning. Finally, we present empirical results on several MAPF benchmarks demonstrating our algorithm's properties.
Guaranteeing Maximin Shares: Some Agents Left Behind
Hosseini, Hadi, Searns, Andrew
The maximin share (MMS) guarantee is a desirable fairness notion for allocating indivisible goods. While MMS allocations do not always exist, several approximation techniques have been developed to ensure that all agents receive a fraction of their maximin share. We focus on an alternative approximation notion, based on the population of agents, that seeks to guarantee MMS for a fraction of agents. We show that no optimal approximation algorithm can satisfy more than a constant number of agents, and discuss the existence and computation of MMS for all but one agent and its relation to approximate MMS guarantees. We then prove the existence of allocations that guarantee MMS for $\frac{2}{3}$ of agents, and devise a polynomial time algorithm that achieves this bound for up to nine agents. A key implication of our result is the existence of allocations that guarantee $\text{MMS}^{\lceil{3n/2}\rceil}$, i.e., the value that agents receive by partitioning the goods into $\lceil{\frac{3}{2}n}\rceil$ bundles, improving the best known guarantee of $\text{MMS}^{2n-2}$. Finally, we provide empirical experiments using synthetic data.
CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space
Li, Shuxin, Zhang, Youzhi, Wang, Xinrun, Xue, Wanqi, An, Bo
In many real-world scenarios, a team of agents coordinate with each other to compete against an opponent. The challenge of solving this type of game is that the team's joint action space grows exponentially with the number of agents, which results in the inefficiency of the existing algorithms, e.g., Counterfactual Regret Minimization (CFR). To address this problem, we propose a new framework of CFR: CFR-MIX. Firstly, we propose a new strategy representation that represents a joint action strategy using individual strategies of all agents and a consistency relationship to maintain the cooperation between agents. To compute the equilibrium with individual strategies under the CFR framework, we transform the consistency relationship between strategies to the consistency relationship between the cumulative regret values. Furthermore, we propose a novel decomposition method over cumulative regret values to guarantee the consistency relationship between the cumulative regret values. Finally, we introduce our new algorithm CFR-MIX which employs a mixing layer to estimate cumulative regret values of joint actions as a non-linear combination of cumulative regret values of individual actions. Experimental results show that CFR-MIX outperforms existing algorithms on various games significantly.
Mean Field Games Flock! The Reinforcement Learning Way
Perrin, Sarah, Lauriรจre, Mathieu, Pรฉrolat, Julien, Geist, Matthieu, รlie, Romuald, Pietquin, Olivier
We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population behavior. Combining Deep Reinforcement Learning (RL) and Normalizing Flows (NF), we obtain a tractable solution requiring only very weak assumptions. Our algorithm finds a Nash Equilibrium and the agents adapt their velocity to match the neighboring flock's average one. We use Fictitious Play and alternate: (1) computing an approximate best response with Deep RL, and (2) estimating the next population distribution with NF. We show numerically that our algorithm learn multi-group or high-dimensional flocking with obstacles.
How Can Robots Trust Each Other? A Relative Needs Entropy Based Trust Assessment Models
Yang, Qin, Parasuraman, Ramviyas
Cooperation in multi-agent and multi-robot systems can help agents build various formations, shapes, and patterns presenting corresponding functions and purposes adapting to different situations. Relationship between agents such as their spatial proximity and functional similarities could play a crucial role in cooperation between agents. Trust level between agents is an essential factor in evaluating their relationships' reliability and stability, much as people do. This paper proposes a new model called Relative Needs Entropy (RNE) to assess trust between robotic agents. RNE measures the distance of needs distribution between individual agents or groups of agents. To exemplify its utility, we implement and demonstrate our trust model through experiments simulating a heterogeneous multi-robot grouping task in a persistent urban search and rescue mission consisting of tasks at two levels of difficulty. The results suggest that RNE trust-Based grouping of robots can achieve better performance and adaptability for diverse task execution compared to the state-of-the-art energy-based or distance-based grouping models.
Scalable, Decentralized Multi-Agent Reinforcement Learning Methods Inspired by Stigmergy and Ant Colonies
Bolstering multi-agent learning algorithms to tackle complex coordination and control tasks has been a long-standing challenge of on-going research. Numerous methods have been proposed to help reduce the effects of non-stationarity and unscalability. In this work, we investigate a novel approach to decentralized multi-agent learning and planning that attempts to address these two challenges. In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies. As a result, these algorithms are designed to be naturally scalable to systems with numerous agents. While no optimality is guaranteed, the method is intended to work well in practice and scale better in efficacy with the number of agents present than others. The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification. Specifically, we apply this algorithm in a setting where agents must navigate to a goal location, learning to push rectangular boxes into holes to yield new traversable pathways. It is shown that while the approach yields promising success in this particular environment, it may not be as easily generalized to others. The algorithm designed is notably scalable to numerous agents but is limited in its performance due to its relatively simplistic, rule-based approach. Furthermore, the composability of RL-trained policies is called into question, where, while policies are successful in their training environments, applying trained policies to a larger-scale, multi-agent framework results in unpredictable behavior.