Agent Societies
Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning
Pu, Yuan, Wang, Shaochen, Yang, Rui, Yao, Xin, Li, Bin
Deep reinforcement learning methods have shown great performance on many challenging cooperative multi-agent tasks. Two main promising research directions are multi-agent value function decomposition and multi-agent policy gradients. In this paper, we propose a new decomposed multi-agent soft actor-critic (mSAC) method, which effectively combines the advantages of the aforementioned two methods. The main modules include decomposed Q network architecture, discrete probabilistic policy and counterfactual advantage function (optinal). Theoretically, mSAC supports efficient off-policy learning and addresses credit assignment problem partially in both discrete and continuous action spaces. Tested on StarCraft II micromanagement cooperative multiagent benchmark, we empirically investigate the performance of mSAC against its variants and analyze the effects of the different components. Experimental results demonstrate that mSAC significantly outperforms policy-based approach COMA, and achieves competitive results with SOTA value-based approach Qmix on most tasks in terms of asymptotic perfomance metric. In addition, mSAC achieves pretty good results on large action space tasks, such as 2c vs 64zg and MMM2.
Altruism Design in Networked Public Goods Games
Yu, Sixie, Kempe, David, Vorobeychik, Yevgeniy
Many collective decision-making settings feature a strategic tension between agents acting out of individual self-interest and promoting a common good. These include wearing face masks during a pandemic, voting, and vaccination. Networked public goods games capture this tension, with networks encoding strategic interdependence among agents. Conventional models of public goods games posit solely individual self-interest as a motivation, even though altruistic motivations have long been known to play a significant role in agents' decisions. We introduce a novel extension of public goods games to account for altruistic motivations by adding a term in the utility function that incorporates the perceived benefits an agent obtains from the welfare of others, mediated by an altruism graph. Most importantly, we view altruism not as immutable, but rather as a lever for promoting the common good. Our central algorithmic question then revolves around the computational complexity of modifying the altruism network to achieve desired public goods game investment profiles. We first show that the problem can be solved using linear programming when a principal can fractionally modify the altruism network. While the problem becomes in general intractable if the principal's actions are all-or-nothing, we exhibit several tractable special cases.
Multi-Agent Routing and Scheduling Through Coalition Formation
Capezzuto, Luca, Tarapore, Danesh, Ramchurn, Sarvapali D.
In task allocation for real-time domains, such as disaster response, a limited number of agents is deployed across a large area to carry out numerous tasks, each with its prerequisites, profit, time window and workload. To maximize profits while minimizing time penalties, agents need to cooperate by forming, disbanding and reforming coalitions. In this paper, we name this problem Multi-Agent Routing and Scheduling through Coalition formation (MARSC) and show that it generalizes the important Team Orienteering Problem with Time Windows. We propose a binary integer program and an anytime and scalable heuristic to solve it. Using public London Fire Brigade records, we create a dataset with 347588 tasks and a test framework that simulates the mobilization of firefighters. In problems with up to 150 agents and 3000 tasks, our heuristic finds solutions up to 3.25 times better than the Earliest Deadline First approach commonly used in real-time systems. Our results constitute the first large-scale benchmark for the MARSC problem.
Noe: Norms Emergence and Robustness Based on Emotions in Multiagent Systems
Tzeng, Sz-Ting, Ajmeri, Nirav, Singh, Munindar P.
Social norms characterize collective and acceptable group conducts in human society. Furthermore, some social norms emerge from interactions of agents or humans. To achieve agent autonomy and make norm satisfaction explainable, we include emotions into the normative reasoning process, which evaluate whether to comply or violate a norm. Specifically, before selecting an action to execute, an agent observes the environment and infer the state and consequences with its internal states after norm satisfaction or violation of a social norm. Both norm satisfaction and violation provoke further emotions, and the subsequent emotions affect norm enforcement. This paper investigates how modeling emotions affect the emergence and robustness of social norms via social simulation experiments. We find that an ability in agents to consider emotional responses to the outcomes of norm satisfaction and violation (1) promote norm compliance; and (2) improve societal welfare.
Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning
Li, Zhenning, Yu, Hao, Zhang, Guohui, Dong, Shangjia, Xu, Cheng-Zhong
Inefficient traffic control may cause numerous problems such as traffic congestion and energy waste. This paper proposes a novel multi-agent reinforcement learning method, named KS-DDPG (Knowledge Sharing Deep Deterministic Policy Gradient) to achieve optimal control by enhancing the cooperation between traffic signals. By introducing the knowledge-sharing enabled communication protocol, each agent can access to the collective representation of the traffic environment collected by all agents. The proposed method is evaluated through two experiments respectively using synthetic and real-world datasets. The comparison with state-of-the-art reinforcement learning-based and conventional transportation methods demonstrate the proposed KS-DDPG has significant efficiency in controlling large-scale transportation networks and coping with fluctuations in traffic flow. In addition, the introduced communication mechanism has also been proven to speed up the convergence of the model without significantly increasing the computational burden.
Learning to Coordinate via Multiple Graph Neural Networks
Xu, Zhiwei, Zhang, Bin, Bai, Yunpeng, Li, Dapeng, Fan, Guoliang
The collaboration between agents has gradually become an important topic in multi-agent systems. The key is how to efficiently solve the credit assignment problems. This paper introduces MGAN for collaborative multi-agent reinforcement learning, a new algorithm that combines graph convolutional networks and value-decomposition methods. MGAN learns the representation of agents from different perspectives through multiple graph networks, and realizes the proper allocation of attention between all agents. We show the amazing ability of the graph network in representation learning by visualizing the output of the graph network, and therefore improve interpretability for the actions of each agent in the multi-agent system.
Shaping Advice in Deep Multi-Agent Reinforcement Learning
Xiao, Baicen, Ramasubramanian, Bhaskar, Poovendran, Radha
Multi-agent reinforcement learning involves multiple agents interacting with each other and a shared environment to complete tasks. When rewards provided by the environment are sparse, agents may not receive immediate feedback on the quality of actions that they take, thereby affecting learning of policies. In this paper, we propose a method called Shaping Advice in deep Multi-agent reinforcement learning (SAM) to augment the reward signal from the environment with an additional reward termed shaping advice. The shaping advice is given by a difference of potential functions at consecutive time-steps. Each potential function is a function of observations and actions of the agents. The shaping advice needs to be specified only once at the start of training, and can be easily provided by non-experts. We show through theoretical analyses and experimental validation that shaping advice provided by SAM does not distract agents from completing tasks specified by the environment reward. Theoretically, we prove that convergence of policy gradients and value functions when using SAM implies convergence of these quantities in the absence of SAM. Experimentally, we evaluate SAM on three tasks in the multi-agent Particle World environment that have sparse rewards. We observe that using SAM results in agents learning policies to complete tasks faster, and obtain higher rewards than: i) using sparse rewards alone; ii) a state-of-the-art reward redistribution method.
The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication
Xu, Xing, Li, Rongpeng, Zhao, Zhifeng, Zhang, Honggang
The paper considers a distributed version of deep reinforcement learning (DRL) for multi-agent decision-making process in the paradigm of federated learning. Since the deep neural network models in federated learning are trained locally and aggregated iteratively through a central server, frequent information exchange incurs a large amount of communication overheads. Besides, due to the heterogeneity of agents, Markov state transition trajectories from different agents are usually unsynchronized within the same time interval, which will further influence the convergence bound of the aggregated deep neural network models. Therefore, it is of vital importance to reasonably evaluate the effectiveness of different optimization methods. Accordingly, this paper proposes a utility function to consider the balance between reducing communication overheads and improving convergence performance. Meanwhile, this paper develops two new optimization methods on top of variation-aware periodic averaging methods: 1) the decay-based method which gradually decreases the weight of the model's local gradients within the progress of local updating, and 2) the consensus-based method which introduces the consensus algorithm into federated learning for the exchange of the model's local gradients. This paper also provides novel convergence guarantees for both developed methods and demonstrates their effectiveness and efficiency through theoretical analysis and numerical simulation results.
Learning to Robustly Negotiate Bi-Directional Lane Usage in High-Conflict Driving Scenarios
Killing, Christoph, Villaflor, Adam, Dolan, John M.
Recently, autonomous driving has made substantial progress in addressing the most common traffic scenarios like intersection navigation and lane changing. However, most of these successes have been limited to scenarios with well-defined traffic rules and require minimal negotiation with other vehicles. In this paper, we introduce a previously unconsidered, yet everyday, high-conflict driving scenario requiring negotiations between agents of equal rights and priorities. There exists no centralized control structure and we do not allow communications. Therefore, it is unknown if other drivers are willing to cooperate, and if so to what extent. We train policies to robustly negotiate with opposing vehicles of an unobservable degree of cooperativeness using multi-agent reinforcement learning (MARL). We propose Discrete Asymmetric Soft Actor-Critic (DASAC), a maximum-entropy off-policy MARL algorithm allowing for centralized training with decentralized execution. We show that using DASAC we are able to successfully negotiate and traverse the scenario considered over 99% of the time. Our agents are robust to an unknown timing of opponent decisions, an unobservable degree of cooperativeness of the opposing vehicle, and previously unencountered policies. Furthermore, they learn to exhibit human-like behaviors such as defensive driving, anticipating solution options and interpreting the behavior of other agents.
Collaborative Agent Gameplay in the Pandemic Board Game
Sfikas, Konstantinos, Liapis, Antonios
Academic research in board game playing AI has of course moved While artificial intelligence has been applied to control players' beyond most pedestrian board games, applying a diverse set of decisions in board games for over half a century, little attention algorithms for playing card games with millions of card combinations is given to games with no player competition. Pandemic is an exemplar such as Magic: the Gathering (Wizards of the Coast, 1993) [3], collaborative board game where all players coordinate to games of tactical card placement such as Lords of War (Black Box, overcome challenges posed by events occurring during the game's 2012) [19] and Carcassonne (Hans im Glück, 2000) [9], card games progression. This paper proposes an artificial agent which controls of team-based competition such as Hanabi (Abacusspiele, 2010) [26] all players' actions and balances chances of winning versus risk or Codenames (Czech Games Edition, 2015) [22], and many more. of losing in this highly stochastic environment. The agent applies Traditional board games such as chess [15] and backgammon a Rolling Horizon Evolutionary Algorithm on an abstraction of [23], as well as recent card games such as Race for the Galaxy (Rio the game-state that lowers the branching factor and simulates the Grande, 2007) [6] or digitized board games such as Hearthstone game's stochasticity. Results show that the proposed algorithm (Blizzard, 2014) [11, 18], focus on players competing to deplete another can find winning strategies more consistently in different games player's resources (pawns, hit points) or to accumulate more of varying difficulty.