Goto

Collaborating Authors

 Agent Societies


Learning Multi-Agent Communication from Graph Modeling Perspective

arXiv.org Artificial Intelligence

In numerous artificial intelligence applications, the collaborative efforts of multiple intelligent agents are imperative for the successful attainment of target objectives. To enhance coordination among these agents, a distributed communication framework is often employed. However, information sharing among all agents proves to be resource-intensive, while the adoption of a manually pre-defined communication architecture imposes limitations on inter-agent communication, thereby constraining the potential for collaborative efforts. In this study, we introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. We formulate this problem as the task of determining the communication graph while enabling the architecture parameters to update normally, thus necessitating a bi-level optimization process. Utilizing continuous relaxation of the graph representation and incorporating attention units, our proposed approach, CommFormer, efficiently optimizes the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner. Extensive experiments on a variety of cooperative tasks substantiate the robustness of our model across diverse cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies regardless of changes in the number of agents.


Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.


LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and Opportunities

arXiv.org Artificial Intelligence

As large language models (LLMs) continue to make significant strides, their better integration into agent-based simulations offers a transformational potential for understanding complex social systems. However, such integration is not trivial and poses numerous challenges. Based on this observation, in this paper, we explore architectures and methods to systematically develop LLM-augmented social simulations and discuss potential research directions in this field. We conclude that integrating LLMs with agent-based simulations offers a powerful toolset for researchers and scientists, allowing for more nuanced, realistic, and comprehensive models of complex systems and human behaviours.


Scalable Decentralized Algorithms for Online Personalized Mean Estimation

arXiv.org Artificial Intelligence

In numerous settings, agents lack sufficient data to directly learn a model. Collaborating with other agents may help, but it introduces a bias-variance trade-off, when local data distributions differ. A key challenge is for each agent to identify clients with similar distributions while learning the model, a problem that remains largely unresolved. This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean. Existing algorithms face impractical space and time complexities (quadratic in the number of agents A). To address scalability challenges, we propose a framework where agents self-organize into a graph, allowing each agent to communicate with only a selected number of peers r. We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach, with complexity of O( r |A| log |A|) and O(r |A|), respectively. We establish conditions under which both algorithms yield asymptotically optimal estimates and offer a theoretical characterization of their performance.


Mitigating Negative Side Effects in Multi-Agent Systems Using Blame Assignment

arXiv.org Artificial Intelligence

When agents that are independently trained (or designed) to complete their individual tasks are deployed in a shared environment, their joint actions may produce negative side effects (NSEs). As their training does not account for the behavior of other agents or their joint action effects on the environment, the agents have no prior knowledge of the NSEs of their actions. We model the problem of mitigating NSEs in a cooperative multi-agent system as a Lexicographic Decentralized Markov Decision Process with two objectives. The agents must optimize the completion of their assigned tasks while mitigating NSEs. We assume independence of transitions and rewards with respect to the agents' tasks but the joint NSE penalty creates a form of dependence in this setting. To improve scalability, the joint NSE penalty is decomposed into individual penalties for each agent using credit assignment, which facilitates decentralized policy computation. Our results in simulation on three domains demonstrate the effectiveness and scalability of our approach in mitigating NSEs by updating the policies of a subset of agents in the system.


A Guide to Re-Implementing Agent-based Models: Experiences from the HUMAT Model

arXiv.org Artificial Intelligence

Replicating existing agent-based models poses significant challenges, particularly for those new to the field. This article presents an all-encompassing guide to re-implementing agent-based models, encompassing vital concepts such as comprehending the original model, utilizing agent-based modeling frameworks, simulation design, model validation, and more. By embracing the proposed guide, researchers and practitioners can gain a profound understanding of the entire re-implementation process, resulting in heightened accuracy and reliability of simulations for complex systems. Furthermore, this article showcases the re-implementation of the HUMAT socio-cognitive architecture, with a specific focus on designing a versatile, language-independent model. The encountered challenges and pitfalls in the re-implementation process are thoroughly discussed, empowering readers with practical insights. Embrace this guide to expedite model development while ensuring robust and precise simulations.


Select to Perfect: Imitating desired behavior from large multi-agent data

arXiv.org Artificial Intelligence

AI agents are commonly trained with large datasets of demonstrations of human behavior. However, not all behaviors are equally safe or desirable. Desired characteristics for an AI agent can be expressed by assigning desirability scores, which we assume are not assigned to individual behaviors but to collective trajectories. For example, in a dataset of vehicle interactions, these scores might relate to the number of incidents that occurred. We first assess the effect of each individual agent's behavior on the collective desirability score, e.g., assessing how likely an agent is to cause incidents. This allows us to selectively imitate agents with a positive effect, e.g., only imitating agents that are unlikely to cause incidents. To enable this, we propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score. The Exchange Value is the expected change in desirability score when substituting the agent for a randomly selected agent. We propose additional methods for estimating Exchange Values from real-world datasets, enabling us to learn desired imitation policies that outperform relevant baselines. Imitating human behaviors from large datasets is a promising technique for achieving human-AI and AI-AI interactions in complex environments (Carroll et al., 2019;, FAIR; He et al., 2023; Shih et al., 2022). However, such large datasets can contain undesirable human behaviors, making direct imitation problematic. Rather than imitating all behaviors, it may be preferable to ensure that AI agents imitate behaviors that align with predefined desirable characteristics. In this work, we assume that desirable characteristics are quantified as desirability scores given for each trajectory in the dataset.


Modelling Opaque Bilateral Market Dynamics in Financial Trading: Insights from a Multi-Agent Simulation Study

arXiv.org Artificial Intelligence

Exploring complex adaptive financial trading environments through multi-agent based simulation methods presents an innovative approach within the realm of quantitative finance. Despite the dominance of multi-agent reinforcement learning approaches in financial markets with observable data, there exists a set of systematically significant financial markets that pose challenges due to their partial or obscured data availability. We, therefore, devise a multi-agent simulation approach employing small-scale meta-heuristic methods. This approach aims to represent the opaque bilateral market for Australian government bond trading, capturing the bilateral nature of bank-to-bank trading, also referred to as "over-the-counter" (OTC) trading, and commonly occurring between "market makers". The uniqueness of the bilateral market, characterized by negotiated transactions and a limited number of agents, yields valuable insights for agent-based modelling and quantitative finance. The inherent rigidity of this market structure, which is at odds with the global proliferation of multilateral platforms and the decentralization of finance, underscores the unique insights offered by our agent-based model. We explore the implications of market rigidity on market structure and consider the element of stability, in market design. This extends the ongoing discourse on complex financial trading environments, providing an enhanced understanding of their dynamics and implications.


SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.


MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.