Goto

Collaborating Authors

 Agent Societies


A Review of Cooperative Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. In this review article, we have mostly focused on recent papers on Multi-Agent Reinforcement Learning (MARL) than the older papers, unless it was necessary. Several ideas and papers are proposed with different notations, and we tried our best to unify them with a single notation and categorize them by their relevance. In particular, we have focused on five common approaches on modeling and solving multi-agent reinforcement learning problems: (I) independent-learners, (II) fully observable critic, (III) value function decomposition, (IV) consensus, (IV) learn to communicate. Moreover, we discuss some new emerging research areas in MARL along with the relevant recent papers. In addition, some of the recent applications of MARL in real world are discussed. Finally, a list of available environments for MARL research are provided and the paper is concluded with proposals on the possible research directions.


Large-scale Traffic Signal Control Using a Novel Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multi-Agent Reinforcement Learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this paper, a new MARL, called Cooperative double Q-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q-learning method based on double estimators and the UCB policy, which can eliminate the over-estimation problem existing in traditional independent Q-learning while ensuring exploration. It uses mean field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied on TSC and tested on a multi-traffic signal simulator. According to the results obtained on several traffic scenarios, Co- DQL outperforms several state-of-the-art decentralized MARL algorithms. It can effectively shorten the average waiting time of the vehicles in the whole road system.



Action Semantics Network: Considering the Effects of Actions in Multiagent Systems

arXiv.org Artificial Intelligence

In multiagent systems (MASs), each agent makes individual decisions but all of them contribute globally to the system evolution. Learning in MASs is difficult since the selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the number of agents. A number of previous works borrow various multiagent coordination mechanisms into deep multiagent learning architecture to facilitate multiagent coordination. However, none of them explicitly consider action semantics between agents. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions' influence on other agents using neural networks based on the action semantics between agents. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II and Neural MMO show ASN significantly improves the performance of state-of-the-art DRL approaches compared with a number of network architectures.


Arena: a toolkit for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

We introduce Arena, a toolkit for multi-agent reinforcement learning (MARL) research. In MARL, it usually requires customizing observations, rewards and actions for each agent, changing cooperative-competitive agent-interaction, and playing with/against a third-party agent, etc. We provide a novel modular design, called Interface, for manipulating such routines in essentially two ways: 1) Different interfaces can be concatenated and combined, which extends the OpenAI Gym Wrappers concept to MARL scenarios. 2) During MARL training or testing, interfaces can be embedded in either wrapped OpenAI Gym compatible Environments or raw environment compatible Agents. We offer off-the-shelf interfaces for several popular MARL platforms, including StarCraft II, Pommerman, ViZDoom, Soccer, etc. The interfaces effectively support self-play RL and cooperative-competitive hybrid MARL. Also, Arena can be conveniently extended to your own favorite MARL platform.


Interpretable Modelling of Driving Behaviors in Interactive Driving Scenarios based on Cumulative Prospect Theory

arXiv.org Artificial Intelligence

Understanding human driving behavior is important for autonomous vehicles. In this paper, we propose an interpretable human behavior model in interactive driving scenarios based on the cumulative prospect theory (CPT). As a non-expected utility theory, CPT can well explain some systematically biased or ``irrational'' behavior/decisions of human that cannot be explained by the expected utility theory. Hence, the goal of this work is to formulate the human drivers' behavior generation model with CPT so that some ``irrational'' behavior or decisions of human can be better captured and predicted. Towards such a goal, we first develop a CPT-driven decision-making model focusing on driving scenarios with two interacting agents. A hierarchical learning algorithm is proposed afterward to learn the utility function, the value function, and the decision weighting function in the CPT model. A case study for roundabout merging is also provided as verification. With real driving data, the prediction performances of three different models are compared: a predefined model based on time-to-collision (TTC), a learning-based model based on neural networks, and the proposed CPT-based model. The results show that the proposed model outperforms the TTC model and achieves similar performance as the learning-based model with much less training data and better interpretability.


Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration

arXiv.org Machine Learning

Exploration efficiency is a challenging problem in multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the collaborative approach among multiple agents. Another important problem is the less informative reward restricts the learning speed of MARL compared with the informative label in supervised learning. In this work, we leverage on a novel communication method to guide MARL to accelerate exploration and propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to better take advantage of the different knowledge learned by different agents which utilizes Time-difference (TD) error more effectively. Experimental results demonstrates that the proposed algorithm outperforms existing methods in cooperative multi-agent environments. We remark that this algorithm can be extended to supervised learning to speed up its training.


Measuring the Vulnerability of a Multi-Agent Pathfinding Solution

AAAI Conferences

Multi-agent pathfinding is the problem of finding a non-interfering paths for a set of agents, such that if the agents follow these paths then each agent will reach its desired destination. Recent years have shown tremendous advances in this field, with optimal and suboptimal algorithms that are able to plan paths for over 100 agents in reasonable time. However, autonomous mobile agents are prime targets for cyber-security attacks, where an adversary may take control over an agent to disrupt the agents execution of their plan. This threat raises two questions. The first question is how much damage can an agent do if it does not follow its plan. The second question is how can one plan a-priori to be as robust as possible to such cyber-attacks. In this work, We provide an answer to both questions. To compute the maximal amount of damage that an adversary agent can do, we define a corresponding graph search problem and solve this problem with A*. Then, we provide a very simple method to choose a solution that is robust to such damages. We demonstrate both algorithms in simulation over standard multi-agent pathfinding domains.


Generalized Target Assignment and Path Finding Using Answer Set Programming

AAAI Conferences

In Multi-Agent Path Finding (MAPF), a team of agents needs to find collision-free paths from their starting locations to their respective targets. Combined Target Assignment and Path Finding (TAPF) extends MAPF by including the problem of assigning targets to agents as a precursor to the MAPF problem. A limitation of both models is their assumption that the number of agents and targets are equal, which is invalid in some applications. We address this limitation by generalizing TAPF to allow for (1) unequal number of agents and tasks; (2) tasks to have deadlines by which they must be completed; (3) ordering of groups of tasks to be completed; and (4) tasks that are composed of a sequence of checkpoints that must be visited in a specific order. Further, we model the problem using answer set programming (ASP) to show that customizing the desired variant of the problem is simple -- one only needs to choose the appropriate combination of ASP rules to enforce it. We also demonstrate experimentally that if problem specific information can be incorporated into the ASP encoding then ASP based methods can be efficient and can scale up to solve practical applications.


An Improved Algorithm for Optimal Coalition Structure Generation

AAAI Conferences

The Coalition Structure Generation (CSG) problem is a partitioning of a set of agents into exhaustive and disjoint coalitions to maximize social welfare. This NP-complete problem arises in many practical scenarios. Prominent examples are included in the field of transportation, e-Commerce, distributed sensor networks, and others. The fastest exact algorithm to solve the CSG problem is ODP-IP, which is a hybrid version of two previously established algorithms, namely Improved Dynamic Programming (IDP) and IP. In this paper, we show that the ODP-IP algorithm performs many redundant operations. To improve ODP-IP, we propose a faster abortion mechanism to speed up IP’s search. Our abortion mechanism decides at runtime which of the IP's operations are redundant to skip them. Then, we propose a modified version of IDP (named MIDP) and an improved version of IP (named IIP). Based on these two improved algorithms, we develop a hybrid version (MIDP-IIP) to solve the CSG problem. After a detailed description of the new algorithm MIDP-IIP, an experimental comparison is conducted against ODP-IP. Our analysis shows that MIDP-IIP performs fewer operations than ODP-IP. In addition, MIDP-IIP reduced significantly many problem instances running times (11% to 37 %), and improved drastically some of them.