Agent Societies
First-Choice Maximality Meets Ex-ante and Ex-post Fairness
Guo, Xiaoxi, Sikdar, Sujoy, Xia, Lirong, Cao, Yongzhi, Wang, Hanpin
For the assignment problem where multiple indivisible items are allocated to a group of agents given their ordinal preferences, we design randomized mechanisms that satisfy first-choice maximality (FCM), i.e., maximizing the number of agents assigned their first choices, together with Pareto efficiency (PE). Our mechanisms also provide guarantees of ex-ante and ex-post fairness. The generalized eager Boston mechanism is ex-ante envy-free, and ex-post envy-free up to one item (EF1). The generalized probabilistic Boston mechanism is also ex-post EF1, and satisfies ex-ante efficiency instead of fairness. We also show that no strategyproof mechanism satisfies ex-post PE, EF1, and FCM simultaneously. In doing so, we expand the frontiers of simultaneously providing efficiency and both ex-ante and ex-post fairness guarantees for the assignment problem.
Multi-agent Continual Coordination via Progressive Task Contextualization
Yuan, Lei, Li, Lihe, Zhang, Ziqian, Zhang, Fuxiang, Guan, Cong, Yu, Yang
Cooperative Multi-agent Reinforcement Learning (MARL) has attracted prominent attention in recent years [1], and achieved great progress in multiple aspects, like path finding [2], active voltage control [3], and dynamic algorithm configuration [4]. Among the multitudinous methods, researchers, on the one hand, focus on facilitating coordination ability via solving specific challenges, including non-stationarity [5], credit assignment [6], and scalability [7]. Other works, on the other hand, investigate the cooperative MARL from multiple aspects, like efficient communication [8], zero-shot coordination (ZSC) [9], policy robustness [10], etc. A lot of methods emerge as promising solutions for different scenarios, including policy-based ones [11,12], value-based series [13,14], and many other variants, showing remarkable coordination ability in a wide range of tasks like SMAC [15]. Despite the great success, the mainstream cooperative MARL methods are still restricted to being trained in one single task or multiple tasks simultaneously, assuming that the agents have access to data from all tasks at all times, which is unrealistic for physical agents in the real world that can only attend to one task at a time. Continual Reinforcement Learning plays a promising role in the mentioned problem [16], where the agent aims to avoid catastrophic forgetting, as well as enable knowledge transfer to new tasks (a.k.a.
Robust Multi-agent Communication via Multi-view Message Certification
Yuan, Lei, Jiang, Tao, Li, Lihe, Chen, Feng, Zhang, Zongzhang, Yu, Yang
Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment. Major relevant works tackle this issue under specific assumptions, like a limited number of message channels would sustain perturbations, limiting the efficiency in complex scenarios. In this paper, we take a further step addressing this issue by learning a robust multi-agent communication policy via multi-view message certification, dubbed CroMAC. Agents trained under CroMAC can obtain guaranteed lower bounds on state-action values to identify and choose the optimal action under a worst-case deviation when the received messages are perturbed. Concretely, we first model multi-agent communication as a multi-view problem, where every message stands for a view of the state. Then we extract a certificated joint message representation by a multi-view variational autoencoder (MVAE) that uses a product-of-experts inference network. For the optimization phase, we do perturbations in the latent space of the state for a certificate guarantee. Then the learned joint message representation is used to approximate the certificated state representation during training. Extensive experiments in several cooperative multi-agent benchmarks validate the effectiveness of the proposed CroMAC.
Improving LaCAM for Scalable Eventually Optimal Multi-Agent Pathfinding
This study extends the recently-developed LaCAM algorithm for multi-agent pathfinding (MAPF). LaCAM is a sub-optimal search-based algorithm that uses lazy successor generation to dramatically reduce the planning effort. We present two enhancements. First, we propose its anytime version, called LaCAM*, which eventually converges to optima, provided that solution costs are accumulated transition costs. Second, we improve the successor generation to quickly obtain initial solutions. Exhaustive experiments demonstrate their utility. For instance, LaCAM* sub-optimally solved 99% of the instances retrieved from the MAPF benchmark, where the number of agents varied up to a thousand, within ten seconds on a standard desktop PC, while ensuring eventual convergence to optima; developing a new horizon of MAPF algorithms.
Stackelberg Games for Learning Emergent Behaviors During Competitive Autocurricula
Yang, Boling, Zheng, Liyuan, Ratliff, Lillian J., Boots, Byron, Smith, Joshua R.
Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust control and interactive manipulation tasks. However, the asymmetric nature of these tasks makes the generation of sophisticated policies challenging. Indeed, the asymmetry in the environment may implicitly or explicitly provide an advantage to a subset of agents which could, in turn, lead to a low-quality equilibrium. This paper proposes a novel game-theoretic algorithm, Stackelberg Multi-Agent Deep Deterministic Policy Gradient (ST-MADDPG), which formulates a two-player MARL problem as a Stackelberg game with one player as the `leader' and the other as the `follower' in a hierarchical interaction structure wherein the leader has an advantage. We first demonstrate that the leader's advantage from ST-MADDPG can be used to alleviate the inherent asymmetry in the environment. By exploiting the leader's advantage, ST-MADDPG improves the quality of a co-evolution process and results in more sophisticated and complex strategies that work well even against an unseen strong opponent.
A framework for the emergence and analysis of language in social learning agents
Wieczorek, Tobias J., Tchumatchenko, Tatjana, Carvajal, Carlos Wert, Eggl, Maximilian F.
Artificial neural networks (ANNs) are increasingly used as research models, but questions remain about their generalizability and representational invariance. Biological neural networks under social constraints evolved to enable communicable representations, demonstrating generalization capabilities. This study proposes a communication protocol between cooperative agents to analyze the formation of individual and shared abstractions and their impact on task performance. This communication protocol aims to mimic language features by encoding high-dimensional information through low-dimensional representation. Using grid-world mazes and reinforcement learning, teacher ANNs pass a compressed message to a student ANN for better task completion. Through this, the student achieves a higher goal-finding rate and generalizes the goal location across task worlds. Further optimizing message content to maximize student reward improves information encoding, suggesting that an accurate representation in the space of messages requires bi-directional input. This highlights the role of language as a common representation between agents and its implications on generalization capabilities.
System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning
Bettini, Matteo, Shankar, Ajay, Prorok, Amanda
Evolutionary science provides evidence that diversity confers resilience. Yet, traditional multi-agent reinforcement learning techniques commonly enforce homogeneity to increase training sample efficiency. When a system of learning agents is not constrained to homogeneous policies, individual agents may develop diverse behaviors, resulting in emergent complementarity that benefits the system. Despite this feat, there is a surprising lack of tools that measure behavioral diversity in systems of learning agents. Such techniques would pave the way towards understanding the impact of diversity in collective resilience and performance. In this paper, we introduce System Neural Diversity (SND): a measure of behavioral heterogeneity for multi-agent systems where agents have stochastic policies. %over a continuous state space. We discuss and prove its theoretical properties, and compare it with alternate, state-of-the-art behavioral diversity metrics used in cross-disciplinary domains. Through simulations of a variety of multi-agent tasks, we show how our metric constitutes an important diagnostic tool to analyze latent properties of behavioral heterogeneity. By comparing SND with task reward in static tasks, where the problem does not change during training, we show that it is key to understanding the effectiveness of heterogeneous vs homogeneous agents. In dynamic tasks, where the problem is affected by repeated disturbances during training, we show that heterogeneous agents are first able to learn specialized roles that allow them to cope with the disturbance, and then retain these roles when the disturbance is removed. SND allows a direct measurement of this latent resilience, while other proxies such as task performance (reward) fail to.
On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring
Foster, Dylan J., Foster, Dean P., Golowich, Noah, Rakhlin, Alexander
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees, and how these considerations change as we move from few to many agents. We study this question in a general framework for interactive decision making with multiple agents, encompassing Markov games with function approximation and normal-form games with bandit feedback. We focus on equilibrium computation, in which a centralized learning algorithm aims to compute an equilibrium by controlling multiple agents that interact with an unknown environment. Our main contributions are: - We provide upper and lower bounds on the optimal sample complexity for multi-agent decision making based on a multi-agent generalization of the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. (2021) in the single-agent counterpart to our setting. Compared to the best results for the single-agent setting, our bounds have additional gaps. We show that no "reasonable" complexity measure can close these gaps, highlighting a striking separation between single and multiple agents. - We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making, but with hidden (unobserved) rewards, a framework that subsumes variants of the partial monitoring problem. As a consequence, we characterize the statistical complexity for hidden-reward interactive decision making to the best extent possible. Building on this development, we provide several new structural results, including 1) conditions under which the statistical complexity of multi-agent decision making can be reduced to that of single-agent, and 2) conditions under which the so-called curse of multiple agents can be avoided.
Strategic Resource Selection with Homophilic Agents
Harder, Jonathan Gadea, Krogmann, Simon, Lenzner, Pascal, Skopalik, Alexander
The strategic selection of resources by selfish agents is a classic research direction, with Resource Selection Games and Congestion Games as prominent examples. In these games, agents select available resources and their utility then depends on the number of agents using the same resources. This implies that there is no distinction between the agents, i.e., they are anonymous. We depart from this very general setting by proposing Resource Selection Games with heterogeneous agents that strive for joint resource usage with similar agents. So, instead of the number of other users of a given resource, our model considers agents with different types and the decisive feature is the fraction of same-type agents among the users. More precisely, similarly to Schelling Games, there is a tolerance threshold $\tau \in [0,1]$ which specifies the agents' desired minimum fraction of same-type agents on a resource. Agents strive to select resources where at least a $\tau$-fraction of those resources' users have the same type as themselves. For $\tau=1$, our model generalizes Hedonic Diversity Games with a peak at $1$. For our general model, we consider the existence and quality of equilibria and the complexity of maximizing social welfare. Additionally, we consider a bounded rationality model, where agents can only estimate the utility of a resource, since they only know the fraction of same-type agents on a given resource, but not the exact numbers. Thus, they cannot know the impact a strategy change would have on a target resource. Interestingly, we show that this type of bounded rationality yields favorable game-theoretic properties and specific equilibria closely approximate equilibria of the full knowledge setting.
Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances
Du, Bin, Qian, Kun, Claudel, Christian, Sun, Dengfeng
This paper proposes to leverage the emerging~learning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilized~to aid the computation-efficient cooperation among~multiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the two~types of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.