Agent Societies
Constructing games on networks for controlling the inequalities in the capital distribution
The inequality in capital or resource distribution is among the important phenomena observed in populations. The sources of inequality and methods for controlling it are of practical interest. To study this phenomenon, we introduce a model of interaction between agents in the network designed for reducing the inequality in the distribution of capital. To achieve the effect of inequality reduction, we interpret the outcome of the elementary game played in the network such that the wining of the game is translated into the reduction of the inequality. We study different interpretations of the introduced scheme and their impact on the behaviour of agents in the terms of the capital distribution, and we provide examples based on the capital dependent Parrondo's paradox. The results presented in this study provide insight into the mechanics of the inequality formation in the society.
Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming
Konan, Sachin, Seraj, Esmaeil, Gombolay, Matthew
Information sharing is key in building team cognition and enables coordination and cooperation. High-performing human teams also benefit from acting strategically with hierarchical levels of iterated communication and rationalizability, meaning a human agent can reason about the actions of their teammates in their decision-making. Yet, the majority of prior work in Multi-Agent Reinforcement Learning (MARL) does not support iterated rationalizability and only encourage inter-agent communication, resulting in a suboptimal equilibrium cooperation strategy. In this work, we show that reformulating an agent's policy to be conditional on the policies of its neighboring teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG). Building on the idea of decision-making under bounded rationality and cognitive hierarchy theory, we show that our modified PG approach not only maximizes local agent rewards but also implicitly reasons about MI between agents without the need for any explicit ad-hoc regularization terms. Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks. Our experiments validate the utility of InfoPG by achieving higher sample efficiency and significantly larger cumulative reward in several complex cooperative multi-agent domains.
Multi-agent Covering Option Discovery based on Kronecker Product of Factor Graphs
Chen, Jiayu, Chen, Jingdi, Lan, Tian, Aggarwal, Vaneet
Covering option discovery has been developed to improve the exploration of reinforcement learning in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. However, these option discovery methods cannot be directly extended to multi-agent scenarios, since the joint state space grows exponentially with the number of agents in the system. Thus, existing researches on adopting options in multi-agent scenarios still rely on single-agent option discovery and fail to directly discover the joint options that can improve the connectivity of the joint state space of agents. In this paper, we show that it is indeed possible to directly compute multi-agent options with collaborative exploratory behaviors among the agents, while still enjoying the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph -- the Kronecker product of individual agents' state transition graphs, based on which we can directly estimate the Fiedler vector of the joint state space using the Laplacian spectrum of individual agents' transition graphs. This decomposition enables us to efficiently construct multi-agent joint options by encouraging agents to connect the sub-goal joint states which are corresponding to the minimum or maximum values of the estimated joint Fiedler vector. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher cumulative rewards.
K-nearest Multi-agent Deep Reinforcement Learning for Collaborative Tasks with a Variable Number of Agents
Khorasgani, Hamed, Wang, Haiyan, Tang, Hsiu-Khuern, Gupta, Chetan
Traditionally, the performance of multi-agent deep reinforcement learning algorithms are demonstrated and validated in gaming environments where we often have a fixed number of agents. In many industrial applications, the number of available agents can change at any given day and even when the number of agents is known ahead of time, it is common for an agent to break during the operation and become unavailable for a period of time. In this paper, we propose a new deep reinforcement learning algorithm for multi-agent collaborative tasks with a variable number of agents. We demonstrate the application of our algorithm using a fleet management simulator developed by Hitachi to generate realistic scenarios in a production site.
Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making
Butcher, Andrew, Johanson, Michael Bradley, Davoodi, Elnaz, Brenneis, Dylan J. A., Acker, Leslie, Parker, Adam S. R., White, Adam, Modayil, Joseph, Pilarski, Patrick M.
In this paper, we contribute a multi-faceted study into Pavlovian signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent. Signalling is intimately connected to time and timing. In service of generating and receiving signals, humans and other animals are known to represent time, determine time since past events, predict the time until a future stimulus, and both recognize and generate patterns that unfold in time. We investigate how different temporal processes impact coordination and signalling between learning agents by introducing a partially observable decision-making domain we call the Frost Hollow. In this domain, a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that works to acquire sparse reward while avoiding time-conditional hazards. We evaluate two domain variations: machine agents interacting in a seven-state linear walk, and human-machine interaction in a virtual-reality environment. Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently. As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning between two agents. We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals. Our results therefore suggest an actionable, constructivist path towards communication learning between reinforcement learning agents.
Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph
Jing, Gangshan, Bai, He, George, Jemin, Chakrabortty, Aranya, Sharma, Piyush. K.
Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks usually assume undirected coordination graphs and communication graphs while estimating a global reward via consensus algorithms for policy evaluation. Such a framework may induce expensive communication costs and exhibit poor scalability due to requirement of global consensus. In this work, we study MARLs with directed coordination graphs, and propose a distributed RL algorithm where the local policy evaluations are based on local value functions. The local value function of each agent is obtained by local communication with its neighbors through a directed learning-induced communication graph, without using any consensus algorithm. A zeroth-order optimization (ZOO) approach based on parameter perturbation is employed to achieve gradient estimation. By comparing with existing ZOO-based RL algorithms, we show that our proposed distributed RL algorithm guarantees high scalability. A distributed resource allocation example is shown to illustrate the effectiveness of our algorithm.
Deep Reinforcement Learning
Deep reinforcement learning has gathered much attention recently. Impressive results were achieved in activities as diverse as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to solve difficult problems. They have learned to fly model helicopters and perform aerobatic manoeuvers such as loops and rolls. In some applications they have even become better than the best humans, such as in Atari, Go, poker and StarCraft. The way in which deep reinforcement learning explores complex environments reminds us of how children learn, by playfully trying out things, getting feedback, and trying again. The computer seems to truly possess aspects of human learning; this goes to the heart of the dream of artificial intelligence. The successes in research have not gone unnoticed by educators, and universities have started to offer courses on the subject. The aim of this book is to provide a comprehensive overview of the field of deep reinforcement learning. The book is written for graduate students of artificial intelligence, and for researchers and practitioners who wish to better understand deep reinforcement learning methods and their challenges. We assume an undergraduate-level of understanding of computer science and artificial intelligence; the programming language of this book is Python. We describe the foundations, the algorithms and the applications of deep reinforcement learning. We cover the established model-free and model-based methods that form the basis of the field. Developments go quickly, and we also cover advanced topics: deep multi-agent reinforcement learning, deep hierarchical reinforcement learning, and deep meta learning.
Learning Complex Spatial Behaviours in ABM: An Experimental Observational Study
Olmez, Sedar, Birks, Dan, Heppenstall, Alison
Capturing and simulating intelligent adaptive behaviours within spatially explicit individual-based models remains an ongoing challenge for researchers. While an ever-increasing abundance of real-world behavioural data are collected, few approaches exist that can quantify and formalise key individual behaviours and how they change over space and time. Consequently, commonly used agent decision-making frameworks, such as event-condition-action rules, are often required to focus only on a narrow range of behaviours. We argue that these behavioural frameworks often do not reflect real-world scenarios and fail to capture how behaviours can develop in response to stimuli. There has been an increased interest in Machine Learning methods and their potential to simulate intelligent adaptive behaviours in recent years. One method that is beginning to gain traction in this area is Reinforcement Learning (RL). This paper explores how RL can be applied to create emergent agent behaviours using a simple predator-prey Agent-Based Model (ABM). Running a series of simulations, we demonstrate that agents trained using the novel Proximal Policy Optimisation (PPO) algorithm behave in ways that exhibit properties of real-world intelligent adaptive behaviours, such as hiding, evading and foraging.
Modeling Prejudice and Its Effect on Societal Prosperity
Mohan, Deep Inder, Verma, Arjun, Rao, Shrisha
Existing studies on prejudice, which is important in multi-group dynamics in societies, focus on the social-psychological knowledge behind the processes involving prejudice and its propagation. We instead create a multi-agent framework that simulates the propagation of prejudice and measures its tangible impact on the prosperity of individuals as well as of larger social structures, including groups and factions within. Groups in society help us define prejudice, and factions represent smaller tight-knit circles of individuals with similar opinions. We model social interactions using the Continuous Prisoner's Dilemma (CPD) and a type of agent called a prejudiced agent, whose cooperation is affected by a prejudice attribute, updated over time based both on the agent's own experiences and those of others in its faction. Our simulations show that modeling prejudice as an exclusively out-group phenomenon generates implicit in-group promotion, which eventually leads to higher relative prosperity of the prejudiced population. This skew in prosperity is shown to be correlated to factors such as size difference between groups and the number of prejudiced agents in a group. Although prejudiced agents achieve higher prosperity within prejudiced societies, their presence degrades the overall prosperity levels of their societies. Our proposed system model can serve as a basis for promoting a deeper understanding of origins, propagation, and ramifications of prejudice through rigorous simulative studies grounded in apt theoretical backgrounds. This can help conduct impactful research on prominent social issues such as racism, religious discrimination, and unfair immigrant treatment. This model can also serve as a foundation to study other socio-psychological phenomena in tandem with prejudice such as the distribution of wealth, social status, and ethnocentrism in a society.
Multi-agent Communication with Graph Information Bottleneck under Limited Bandwidth
Tian, Qi, Kuang, Kun, Wang, Baoxiang, Liu, Furui, Wu, Fei
Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL). In many real-world scenarios, communication can be expensive and the bandwidth of the multi-agent system is subject to certain constraints. Redundant messages who occupy the communication resources can block the transmission of informative messages and thus jeopardize the performance. In this paper, we aim to learn the minimal sufficient communication messages. First, we initiate the communication between agents by a complete graph. Then we introduce the graph information bottleneck (GIB) principle into this complete graph and derive the optimization over graph structures. Based on the optimization, a novel multi-agent communication module, called CommGIB, is proposed, which effectively compresses the structure information and node information in the communication graph to deal with bandwidth-constrained settings. Extensive experiments in Traffic Control and StanCraft II are conducted. The results indicate that the proposed methods can achieve better performance in bandwidth-restricted settings compared with state-of-the-art algorithms, with especially large margins in large-scale multi-agent tasks.