Agent Societies
Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution
Kakish, Zahi M., Elamvazhuthi, Karthik, Berman, Spring
In this paper, we present a reinforcement learning approach to designing a control policy for a "leader'' agent that herds a swarm of "follower'' agents, via repulsive interactions, as quickly as possible to a target probability distribution over a strongly connected graph. The leader control policy is a function of the swarm distribution, which evolves over time according to a mean-field model in the form of an ordinary difference equation. The dependence of the policy on agent populations at each graph vertex, rather than on individual agent activity, simplifies the observations required by the leader and enables the control strategy to scale with the number of agents. Two Temporal-Difference learning algorithms, SARSA and Q-Learning, are used to generate the leader control policy based on the follower agent distribution and the leader's location on the graph. A simulation environment corresponding to a grid graph with 4 vertices was used to train and validate the control policies for follower agent populations ranging from 10 to 100. Finally, the control policies trained on 100 simulated agents were used to successfully redistribute a physical swarm of 10 small robots to a target distribution among 4 spatial regions.
A novel approach for multi-agent cooperative pursuit to capture grouped evaders
Qadir, Muhammad Zuhair, Piao, Songhao, Jiang, Haiyang, Souidi, Mohammed El Habib
An approach of mobile multi-agent pursuit based on application of self-organizing feature map (SOFM) and along with that reinforcement learning based on agent group role membership function (AGRMF) model is proposed. This method promotes dynamic organization of the pursuers' groups and also makes pursuers' group evader according to their desire based on SOFM and AGRMF techniques. This helps to overcome the shortcomings of the pursuers that they cannot fully reorganize when the goal is too independent in process of AGRMF models operation. Besides, we also discuss a new reward function. After the formation of the group, reinforcement learning is applied to get the optimal solution for each agent. The results of each step in capturing process will finally affect the AGR membership function to speed up the convergence of the competitive neural network. The experiments result shows that this approach is more effective for the mobile agents to capture evaders.
Towards Understanding Linear Value Decomposition in Cooperative Multi-Agent Q-Learning
Wang, Jianhao, Ren, Zhizhou, Han, Beining, Zhang, Chongjie
Value decomposition is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings. However, the theoretical understanding of such methods is limited. In this paper, we introduce a variant of the fitted Q-iteration framework for analyzing multi-agent Q-learning with value decomposition. Based on this framework, we derive a closed-form solution to the Bellman error minimization with linear value decomposition. With this novel solution, we further reveal two interesting insights: 1) linear value decomposition implicitly implements a classical multi-agent credit assignment called counterfactual difference rewards; and 2) multi-agent Q-learning with linear value decomposition requires on-policy data distribution to achieve numerical stability. In the empirical study, our experiments demonstrate the realizability of our theoretical implications in a broad set of complicated tasks. They show that most state-of-the-art deep multi-agent Q-learning algorithms using linear value decomposition cannot efficiently utilize off-policy samples, which may even lead to an unbounded divergence.
Emergent cooperation through mutual information maximization
Cuervo, Santiago, Alzate, Marco
With artificial intelligence systems becoming ubiquitous in our society, its designers will soon have to start to consider its social dimension, as many of these systems will have to interact among them to work efficiently. With this in mind, we propose a decentralized deep reinforcement learning algorithm for the design of cooperative multi-agent systems. The algorithm is based on the hypothesis that highly correlated actions are a feature of cooperative systems, and hence, we propose the insertion of an auxiliary objective of maximization of the mutual information between the actions of agents in the learning problem. Our system is applied to a social dilemma, a problem whose optimal solution requires that agents cooperate to maximize a macroscopic performance function despite the divergent individual objectives of each agent. By comparing the performance of the proposed system to a system without the auxiliary objective, we conclude that the maximization of mutual information among agents promotes the emergence of cooperation in social dilemmas.
Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning
Wang, Lingxiao, Yang, Zhuoran, Wang, Zhaoran
Multi-agent reinforcement learning (MARL) achieves significant empirical successes. However, MARL suffers from the curse of many agents. In this paper, we exploit the symmetry of agents in MARL. In the most generic form, we study a mean-field MARL problem. Such a mean-field MARL is defined on mean-field states, which are distributions that are supported on continuous space. Based on the mean embedding of the distributions, we propose MF-FQI algorithm that solves the mean-field MARL and establishes a non-asymptotic analysis for MF-FQI algorithm. We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.
Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning
Li, Sheng, Gupta, Jayesh K., Morales, Peter, Allen, Ross, Kochenderfer, Mykel J.
Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large number of agents. We apply DICG to both centralized-training-centralized-execution and centralized-training-decentralized-execution regimes. We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.
Contextual and Possibilistic Reasoning for Coalition Formation
Bikakis, Antonis, Caire, Patrice
In multiagent systems, agents often have to rely on other agents to reach their goals, for example when they lack a needed resource or do not have the capability to perform a required action. Agents therefore need to cooperate. Then, some of the questions raised are: Which agent(s) to cooperate with? What are the potential coalitions in which agents can achieve their goals? As the number of possibilities is potentially quite large, how to automate the process? And then, how to select the most appropriate coalition, taking into account the uncertainty in the agents' abilities to carry out certain tasks? In this article, we address the question of how to find and evaluate coalitions among agents in multiagent systems using MCS tools, while taking into consideration the uncertainty around the agents' actions. Our methodology is the following: We first compute the solution space for the formation of coalitions using a contextual reasoning approach. Second, we model agents as contexts in Multi-Context Systems (MCS), and dependence relations among agents seeking to achieve their goals, as bridge rules. Third, we systematically compute all potential coalitions using algorithms for MCS equilibria, and given a set of functional and non-functional requirements, we propose ways to select the best solutions. Finally, in order to handle the uncertainty in the agents' actions, we extend our approach with features of possibilistic reasoning. We illustrate our approach with an example from robotics.
Cooperative Multi-Agent Reinforcement Learning with Partial Observations
Zhang, Yan, Zavlanos, Michael M.
In this paper, we propose a distributed zeroth-order policy optimization method for Multi-Agent Reinforcement Learning (MARL). Existing MARL algorithms often assume that every agent can observe the states and actions of all the other agents in the network. This can be impractical in large-scale problems, where sharing the state and action information with multi-hop neighbors may incur significant communication overhead. The advantage of the proposed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus. Specifically, to calculate the local policy gradients, we develop a new distributed zeroth-order policy gradient estimator that relies on one-point residual-feedback which, compared to existing zeroth-order estimators that also rely on one-point feedback, significantly reduces the variance of the policy gradient estimates improving, in this way, the learning performance. We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to a neighborhood of the global optimal policy that depends on the number of consensus steps used to calculate the local estimates of the global accumulated rewards. Moreover, we provide numerical experiments that demonstrate that our new zeroth-order policy gradient estimator is more sample-efficient compared to other existing one-point estimators.
Quantitatively Assessing the Benefits of Model-driven Development in Agent-based Modeling and Simulation
Santos, Fernando, Nunes, Ingrid, Bazzan, Ana L. C.
The agent-based modeling and simulation (ABMS) paradigm has been used to analyze, reproduce, and predict phenomena related to many application areas. Although there are many agent-based platforms that support simulation development, they rely on programming languages that require extensive programming knowledge. Model-driven development (MDD) has been explored to facilitate simulation modeling, by means of high-level modeling languages that provide reusable building blocks that hide computational complexity, and code generation. However, there is still limited knowledge of how MDD approaches to ABMS contribute to increasing development productivity and quality. We thus in this paper present an empirical study that quantitatively compares the use of MDD and ABMS platforms mainly in terms of effort and developer mistakes. Our evaluation was performed using MDD4ABMS-an MDD approach with a core and extensions to two application areas, one of which developed for this study-and NetLogo, a widely used platform. The obtained results show that MDD4ABMS requires less effort to develop simulations with similar (sometimes better) design quality than NetLogo, giving evidence of the benefits that MDD can provide to ABMS.
Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms
Papoudakis, Georgios, Christianos, Filippos, Schäfer, Lukas, Albrecht, Stefano V.
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learners, centralised training with decentralised execution, and value decomposition) in a diverse range of multi-agent learning tasks. Our results show that (1) algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks; (2) independent learners often achieve equal or better performance than more complex algorithms; (3) tested algorithms struggle to solve multi-agent tasks with sparse rewards. We report detailed empirical data, including a reliability analysis, and provide insights into the limitations of the tested algorithms.