cooperator
- Asia > China > Jiangsu Province > Yancheng (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (0.93)
- Research Report > New Finding (0.93)
SupplementaryMaterial
We provide additional results for EGTA applied to networked MARL system control for CPR management. Restraint percentages under different regeneration rates The heatmaps in Figure 7 (A-C) highlight the differences in restraint percentage for different values ofα as the regeneration rate is changed from high(0.1)to In the case where agents are completely self-interested (α = 0)shownin(A), themajority ofalgorithms without communication display verylowlevels of restraint for all rates of regeneration. The orange ovals in these diagrams indicate which system configurations correspond to the highest expected payofffor all agents. Schelling diagrams using a different parameterisation An alternative parameterisation for a Schelling diagram is to plot payoffs for a particular agent (cooperating or defecting) with respect to the number ofother cooperators on thex-axis, instead of thetotalnumber of cooperators.
Realistic gossip in Trust Game on networks: the GODS model
Majewski, Jan, Giardini, Francesca
Gossip has been shown to be a relatively efficient solution to problems of cooperation in reputation-based systems of exchange, but many studies don't conceptualize gossiping in a realistic way, often assuming near-perfect information or broadcast-like dynamics of its spread. To solve this problem, we developed an agent-based model that pairs realistic gossip processes with different variants of Trust Game. The results show that cooperators suffer when local interactions govern spread of gossip, because they cannot discriminate against defectors. Realistic gossiping increases the overall amount of resources, but is more likely to promote defection. Moreover, even partner selection through dynamic networks can lead to high payoff inequalities among agent types. Cooperators face a choice between outcompeting defectors and overall growth. By blending direct and indirect reciprocity with reputations we show that gossiping increases the efficiency of cooperation by an order of magnitude.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Chaudhary, Paresh, Liang, Yancheng, Chen, Daphne, Du, Simon S., Jaques, Natasha
Being able to cooperate with diverse humans is an important component of many economically valuable AI tasks, from household robotics to autonomous driving. However, generalizing to novel humans requires training on data that captures the diversity of human behaviors. Adversarial training is a promising method that allows dynamic data generation and ensures that agents are robust. It creates a feedback loop where the agent's performance influences the generation of new adversarial data, which can be used immediately to train the agent. However, adversarial training is difficult to apply in a cooperative task; how can we train an adversarial cooperator? We propose a novel strategy that combines a pretrained generative model to simulate valid cooperative agent policies with adversarial training to maximize regret. We call our method GOAT: Generative Online Adversarial Training. In this framework, the GOAT dynamically searches the latent space of the generative model for coordination strategies where the learning policy, the Cooperator agent, underperforms. GOAT enables better generalization by exposing the Cooperator to various challenging interaction scenarios. We maintain realistic coordination strategies by keeping the generative model frozen, thus avoiding adversarial exploitation. We evaluate GOAT with real human partners, and the results demonstrate state of the art performance on the Overcooked benchmark, highlighting its effectiveness in generalizing to diverse human behaviors.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
- Asia > China > Jiangsu Province > Yancheng (0.04)
- Research Report > Promising Solution (0.34)
- Research Report > New Finding (0.34)
- Leisure & Entertainment > Games (1.00)
- Information Technology (0.88)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Natural Language (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
GRPO-GCC: Enhancing Cooperation in Spatial Public Goods Games via Group Relative Policy Optimization with Global Cooperation Constraint
Yang, Zhaoqilin, Li, Chanchan, Liu, Tianqi, Zhao, Hongxin, Tian, Youliang
Inspired by the principle of self-regulating cooperation in collective institutions, we propose the Group Relative Policy Optimization with Global Cooperation Constraint (GRPO-GCC) framework. This work is the first to introduce GRPO into spatial public goods games, establishing a new deep reinforcement learning baseline for structured populations. GRPO-GCC integrates group relative policy optimization with a global cooperation constraint that strengthens incentives at intermediate cooperation levels while weakening them at extremes. This mechanism aligns local decision making with sustainable collective outcomes and prevents collapse into either universal defection or unconditional cooperation. The framework advances beyond existing approaches by combining group-normalized advantage estimation, a reference-anchored KL penalty, and a global incentive term that dynamically adjusts cooperative payoffs. As a result, it achieves accelerated cooperation onset, stabilized policy adaptation, and long-term sustainability. GRPO-GCC demonstrates how a simple yet global signal can reshape incentives toward resilient cooperation, and provides a new paradigm for multi-agent reinforcement learning in socio-technical systems.
- Asia > China > Guizhou Province (0.14)
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Social Sector (0.48)
- Law (0.46)
- Government (0.46)
- Information Technology > Security & Privacy (0.46)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Asia > China > Jiangsu Province > Yancheng (0.04)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (0.93)
- Research Report > New Finding (0.93)
Cooperation in public goods game on regular lattices with agents changing interaction groups
The emergence of cooperation in the groups of interacting agents is one of the most fascinating phenomena observed in many complex systems studied in social science and ecology, even in the situations where one would expect the agent to use a free-rider policy. This is especially surprising in the situation where no external mechanisms based on reputation or punishment are present. One of the possible explanations of this effect is the inhomogeneity of the various aspects of interactions, which can be used to clarify the seemingly paradoxical behavior. In this report we demonstrate that the diversity of interaction networks helps to some degree to explain the emergence of cooperation. We extend the model of spatial interaction diversity introduced in [L. Shang et al., Physica A, 593:126999 (2022)] by enabling the evaluation of the interaction groups. We show that the process of the reevaluation of the interaction group facilitates the emergence of cooperation. Furthermore, we also observe that a significant participation of agents switching their interaction neighborhoods has a negative impact on the formation of cooperation. The introduced scenario can help to understand the formation of cooperation in the systems where no additional mechanisms for controlling agents are included.
- South America > Colombia (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
Supplementary Material
We provide additional results for EGT A applied to networked MARL system control for CPR management. Specifically, we investigate the consequence of different reward structures. Potential Nash equilibria are shaded in blue. NeurComm (across all values of α), which is likely due to its consensus update mechanism. The orange ovals in these diagrams indicate which system configurations correspond to the highest expected payoff for all agents.
Dilution, Diffusion and Symbiosis in Spatial Prisoner's Dilemma with Reinforcement Learning
Mangold, Gustavo C., Fernandes, Heitor C. M., Vainstein, Mendeli H.
Recent studies in the spatial prisoner's dilemma games with reinforcement learning have shown that static agents can learn to cooperate through a diverse sort of mechanisms, including noise injection, different types of learning algorithms and neighbours' payoff knowledge. In this work, using an independent multi-agent Q-learning algorithm, we study the effects of dilution and mobility in the spatial version of the prisoner's dilemma. Within this setting, different possible actions for the algorithm are defined, connecting with previous results on the classical, non-reinforcement learning spatial prisoner's dilemma, showcasing the versatility of the algorithm in modeling different game-theoretical scenarios and the benchmarking potential of this approach. As a result, a range of effects is observed, including evidence that games with fixed update rules can be qualitatively equivalent to those with learned ones, as well as the emergence of a symbiotic mutualistic effect between populations that forms when multiple actions are defined.
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration
Li, Benjamin, Shi, Shuyang, Romero, Lucia, Li, Huao, Xie, Yaqi, Kim, Woojun, Nikolaidis, Stefanos, Lewis, Michael, Sycara, Katia, Stepputtis, Simon
In collaborative tasks, being able to adapt to your teammates is a necessary requirement for success. When teammates are heterogeneous, such as in human-agent teams, agents need to be able to observe, recognize, and adapt to their human partners in real time. This becomes particularly challenging in tasks with time pressure and complex strategic spaces where the dynamics can change rapidly. In this work, we introduce TALENTS, a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a range of partner strategies, enabling ad-hoc teamwork. Our approach utilizes a variational autoencoder to learn a latent strategy space from trajectory data. This latent space represents the underlying strategies that agents employ. Subsequently, the system identifies different types of strategy by clustering the data. Finally, a cooperator agent is trained to generate partners for each type of strategy, conditioned on these clusters. In order to adapt to previously unseen partners, we leverage a fixed-share regret minimization algorithm that infers and adjusts the estimated partner strategy dynamically. We assess our approach in a customized version of the Overcooked environment, posing a challenging cooperative cooking task that demands strong coordination across a wide range of possible strategies. Using an online user study, we show that our agent outperforms current baselines when working with unfamiliar human partners.
- North America > United States > California (0.14)
- North America > United States > Virginia (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (1.00)
- Questionnaire & Opinion Survey (0.86)