Goto

Collaborating Authors

Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.


U of T, Vector Institute woo rising stars in machine learning field

#artificialintelligence

The University of Toronto and the affiliated Vector Institute for Artificial Intelligence have announced the recruitment of two rising stars in machine learning research as part of a continued drive to assemble the best AI talent in the world. Chris Maddison and Jakob Foerster will both come to U of T having completed their doctoral research at the University of Oxford. He earned his undergraduate and master's degrees in computer science at U of T – the latter under the supervision of University Professor Emeritus Geoffrey Hinton. A senior research scientist at Google-owned AI firm DeepMind, Maddison will join U of T's departments of computer science and statistical sciences in the Faculty of Arts & Science as an assistant professor next summer. Foerster, a research scientist at Facebook AI Research, will start as an assistant professor in the department of computer and mathematical sciences at U of T Scarborough in fall of 2020.


Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.


Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

In this paper, we propose actor-critic approaches by introducing an actor policy on QMIX [9], which can remove the monotonicity constraint of QMIX and implement a non-monotonic value function factorization for joint action-value. We evaluate our actor-critic methods on StarCraft II micromanagement tasks, and show that it has a stronger performance on maps with heterogeneous agent types.


QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.