Agent Societies
Social Learning through Interactions with Other Agents: A Survey
Hillier, Dylan, Tan, Cheston, Jiang, Jing
Social learning plays an important role in the development of human intelligence. As children, we imitate our parents' speech patterns until we are able to produce sounds; we learn from them praising us and scolding us; and as adults, we learn by working with others. In this work, we survey the degree to which this paradigm -- social learning -- has been mirrored in machine learning. In particular, since learning socially requires interacting with others, we are interested in how embodied agents can and have utilised these techniques. This is especially in light of the degree to which recent advances in natural language processing (NLP) enable us to perform new forms of social learning. We look at how behavioural cloning and next-token prediction mirror human imitation, how learning from human feedback mirrors human education, and how we can go further to enable fully communicative agents that learn from each other. We find that while individual social learning techniques have been used successfully, there has been little unifying work showing how to bring them together into socially embodied agents.
Self-Emotion Blended Dialogue Generation in Social Simulation Agents
Zhang, Qiang, Naradowsky, Jason, Miyao, Yusuke
When engaging in conversations, dialogue agents in a virtual simulation environment may exhibit their own emotional states that are unrelated to the immediate conversational context, a phenomenon known as self-emotion. This study explores how such self-emotion affects the agents' behaviors in dialogue strategies and decision-making within a large language model (LLM)-driven simulation framework. In a dialogue strategy prediction experiment, we analyze the dialogue strategy choices employed by agents both with and without self-emotion, comparing them to those of humans. The results show that incorporating self-emotion helps agents exhibit more human-like dialogue strategies. In an independent experiment comparing the performance of models fine-tuned on GPT-4 generated dialogue datasets, we demonstrate that self-emotion can lead to better overall naturalness and humanness. Finally, in a virtual simulation environment where agents have discussions on multiple topics, we show that self-emotion of agents can significantly influence the decision-making process of the agents, leading to approximately a 50% change in decisions.
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS
Kazemkhani, Saman, Pandya, Aarav, Cornelisse, Daphne, Shacklett, Brennan, Vinitsky, Eugene
Multi-agent learning algorithms have been successful at generating superhuman planning in a wide variety of games but have had little impact on the design of deployed multi-agent planners. A key bottleneck in applying these techniques to multi-agent planning is that they require billions of steps of experience. To enable the study of multi-agent planning at this scale, we present GPUDrive, a GPU-accelerated, multi-agent simulator built on top of the Madrona Game Engine that can generate over a million steps of experience per second. Observation, reward, and dynamics functions are written directly in C++, allowing users to define complex, heterogeneous agent behaviors that are lowered to high-performance CUDA. We show that using GPUDrive we are able to effectively train reinforcement learning agents over many scenes in the Waymo Motion dataset, yielding highly effective goal-reaching agents in minutes for individual scenes and generally capable agents in a few hours.
Y Social: an LLM-powered Social Media Digital Twin
Rossetti, Giulio, Stella, Massimo, Cazabet, Rรฉmy, Abramski, Katherine, Cau, Erica, Citraro, Salvatore, Failla, Andrea, Improta, Riccardo, Morini, Virginia, Pansanella, Valentina
Online social media (OSM henceforth) have revolutionized the way we exchange information. From the user's perspective, these digital ecosystems are largely effortless [136], enabling convenient ways of exchanging personal content [1], seeking information [129] and synchronizing with others [37]. This convenience has catalyzed a massive digital shift in social and information exchanges from offline to online settings [136], which has provided novel access to massive amounts of online data regarding human behaviour [141]. Unconstrained by geographical barriers, the massive adoption of social media has given rise to novel phenomena that are absent in in-person interactions, such as the influence of complexity and artificial intelligence. Complexity in social media is strongly related to the motto "more is different" [7]: the idea that the co-occurrence of many, even similar, interactions within the same context can lead to unexpected phenomena. Examples include acts as simple and seemingly insignificant as following another user, or re-sharing content. Taken individually, these actions can be understood in terms of a user's activity, psychology, and engagement [91, 97, 141], but when repeated by vast amounts of users, these actions can determine the unexpected rise
Learning in Multi-Objective Public Goods Games with Non-Linear Utilities
Orzan, Nicole, Acar, Erman, Grossi, Davide, Mannion, Patrick, Rฤdulescu, Roxana
Addressing the question of how to achieve optimal decision-making under risk and uncertainty is crucial for enhancing the capabilities of artificial agents that collaborate with or support humans. In this work, we address this question in the context of Public Goods Games. We study learning in a novel multi-objective version of the Public Goods Game where agents have different risk preferences, by means of multi-objective reinforcement learning. We introduce a parametric non-linear utility function to model risk preferences at the level of individual agents, over the collective and individual reward components of the game. We study the interplay between such preference modelling and environmental uncertainty on the incentive alignment level in the game. We demonstrate how different combinations of individual preferences and environmental uncertainties sustain the emergence of cooperative patterns in non-cooperative environments (i.e., where competitive strategies are dominant), while others sustain competitive patterns in cooperative environments (i.e., where cooperative strategies are dominant).
Multi-agent Assessment with QoS Enhancement for HD Map Updates in a Vehicular Network
Redondo, Jeffrey, Aslam, Nauman, Zhang, Juan, Yuan, Zhenhui
Reinforcement Learning (RL) algorithms have been used to address the challenging problems in the offloading process of vehicular ad hoc networks (VANET). More recently, they have been utilized to improve the dissemination of high-definition (HD) Maps. Nevertheless, implementing solutions such as deep Q-learning (DQN) and Actor-critic at the autonomous vehicle (AV) may lead to an increase in the computational load, causing a heavy burden on the computational devices and higher costs. Moreover, their implementation might raise compatibility issues between technologies due to the required modifications to the standards. Therefore, in this paper, we assess the scalability of an application utilizing a Q-learning single-agent solution in a distributed multi-agent environment. This application improves the network performance by taking advantage of a smaller state, and action space whilst using a multi-agent approach. The proposed solution is extensively evaluated with different test cases involving reward function considering individual or overall network performance, number of agents, and centralized and distributed learning comparison. The experimental results demonstrate that the time latencies of our proposed solution conducted in voice, video, HD Map, and best-effort cases have significant improvements, with 40.4%, 36%, 43%, and 12% respectively, compared to the performances with the single-agent approach.
B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency
Zhang, Wenjing, Zhang, Wei, Hu, Wenqing, Wang, Yifan
Most multi-agent reinforcement learning approaches adopt two types of policy optimization methods that either update policy simultaneously or sequentially. Simultaneously updating policies of all agents introduces non-stationarity problem. Although sequentially updating policies agent-by-agent in an appropriate order improves policy performance, it is prone to low efficiency due to sequential execution, resulting in longer model training and execution time. Intuitively, partitioning policies of all agents according to their interdependence and updating joint policy batch-by-batch can effectively balance performance and efficiency. However, how to determine the optimal batch partition of policies and batch updating order are challenging problems. Firstly, a sequential batched policy updating scheme, B2MAPO (Batch by Batch Multi-Agent Policy Optimization), is proposed with a theoretical guarantee of the monotonic incrementally tightened bound. Secondly, a universal modulized plug-and-play B2MAPO hierarchical framework, which satisfies CTDE principle, is designed to conveniently integrate any MARL models to fully exploit and merge their merits, including policy optimality and inference efficiency. Next, a DAG-based B2MAPO algorithm is devised, which is a carefully designed implementation of B2MAPO framework. Comprehensive experimental results conducted on StarCraftII Multi-agent Challenge and Google Football Research demonstrate the performance of DAG-based B2MAPO algorithm outperforms baseline methods. Meanwhile, compared with A2PO, our algorithm reduces the model training and execution time by 60.4% and 78.7%, respectively.
Mechanism Design for Locating Facilities with Capacities with Insufficient Resources
Auricchio, Gennaro, Clough, Harry J., Zhang, Jie
This paper explores the Mechanism Design aspects of the $m$-Capacitated Facility Location Problem where the total facility capacity is less than the number of agents. Following the framework outlined by Aziz et al., the Social Welfare of the facility location is determined through a First-Come-First-Served (FCFS) game, in which agents compete once the facility positions are established. When the number of facilities is $m > 1$, the Nash Equilibrium (NE) of the FCFS game is not unique, making the utility of the agents and the concept of truthfulness unclear. To tackle these issues, we consider absolutely truthful mechanisms, i.e. mechanisms that prevent agents from misreporting regardless of the strategies used during the FCFS game. We combine this stricter truthfulness requirement with the notion of Equilibrium Stable (ES) mechanisms, which are mechanisms whose Social Welfare does not depend on the NE of the FCFS game. We demonstrate that the class of percentile mechanisms is absolutely truthful and identify the conditions under which they are ES. We also show that the approximation ratio of each ES percentile mechanism is bounded and determine its value. Notably, when all the facilities have the same capacity and the number of agents is sufficiently large, it is possible to achieve an approximation ratio smaller than $1+\frac{1}{2m-1}$. Finally, we extend our study to encompass higher-dimensional problems. Within this framework, we demonstrate that the class of ES percentile mechanisms is even more restricted and characterize the mechanisms that are both ES and absolutely truthful. We further support our findings by empirically evaluating the performance of the mechanisms when the agents are the samples of a distribution.
Relational Q-Functionals: Multi-Agent Learning to Recover from Unforeseen Robot Malfunctions in Continuous Action Domains
Findik, Yasin, Robinette, Paul, Jerath, Kshitij, Azadeh, Reza
Cooperative multi-agent learning methods are essential in developing effective cooperation strategies in multi-agent domains. In robotics, these methods extend beyond multi-robot scenarios to single-robot systems, where they enable coordination among different robot modules (e.g., robot legs or joints). However, current methods often struggle to quickly adapt to unforeseen failures, such as a malfunctioning robot leg, especially after the algorithm has converged to a strategy. To overcome this, we introduce the Relational Q-Functionals (RQF) framework. RQF leverages a relational network, representing agents' relationships, to enhance adaptability, providing resilience against malfunction(s). Our algorithm also efficiently handles continuous state-action domains, making it adept for robotic learning tasks. Our empirical results show that RQF enables agents to use these relationships effectively to facilitate cooperation and recover from an unexpected malfunction in single-robot systems with multiple interacting modules. Thus, our approach offers promising applications in multi-agent systems, particularly in scenarios with unforeseen malfunctions.
Collaborative Adaptation for Recovery from Unforeseen Malfunctions in Discrete and Continuous MARL Domains
Findik, Yasin, Hasenfus, Hunter, Azadeh, Reza
Cooperative multi-agent learning plays a crucial role for developing effective strategies to achieve individual or shared objectives in multi-agent teams. In real-world settings, agents may face unexpected failures, such as a robot's leg malfunctioning or a teammate's battery running out. These malfunctions decrease the team's ability to accomplish assigned task(s), especially if they occur after the learning algorithms have already converged onto a collaborative strategy. Current leading approaches in Multi-Agent Reinforcement Learning (MARL) often recover slowly -- if at all -- from such malfunctions. To overcome this limitation, we present the Collaborative Adaptation (CA) framework, highlighting its unique capability to operate in both continuous and discrete domains. Our framework enhances the adaptability of agents to unexpected failures by integrating inter-agent relationships into their learning processes, thereby accelerating the recovery from malfunctions. We evaluated our framework's performance through experiments in both discrete and continuous environments. Empirical results reveal that in scenarios involving unforeseen malfunction, although state-of-the-art algorithms often converge on sub-optimal solutions, the proposed CA framework mitigates and recovers more effectively.