Agents
Communication-Efficient and Decentralized Multi-Task Boosting while Learning the Collaboration Graph
Zantedeschi, Valentina, Bellet, Aurélien, Tommasi, Marc
In the era of big data, the classical paradigm is to build huge data centers to collect and process users' data. This centralized access to resources and datasets simplifies some procedures, such as building predictive models with machine learning, but also comes with important drawbacks. From the company point of view, the need to gather and analyze the data centrally induces high infrastructure costs. As the server represents a single point of entry, it must also be secure enough to prevent attacks that could put the entire user database in jeopardy. On the user end, disadvantages include limited control over one's personal data as well as possible privacy risks, which may come from the aforementioned attacks but also from potentially loose data governance policies on the part of the companies.
Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies
Phan, Thomy, Schmid, Kyrill, Belzner, Lenz, Gabor, Thomas, Feld, Sebastian, Linnhoff-Popien, Claudia
Decision making in multi-agent systems (MAS) is a great challenge due to enormous state and joint action spaces as well as uncertainty, making centralized control generally infeasible. Decentralized control offers better scalability and robustness but requires mechanisms to coordinate on joint tasks and to avoid conflicts. Common approaches to learn decentralized policies for cooperative MAS suffer from non-stationarity and lacking credit assignment, which can lead to unstable and uncoordinated behavior in complex environments. In this paper, we propose Strong Emergent Policy approximation (STEP), a scalable approach to learn strong decentralized policies for cooperative MAS with a distributed variant of policy iteration. For that, we use function approximation to learn from action recommendations of a decentralized multi-agent planning algorithm. STEP combines decentralized multi-agent planning with centralized learning, only requiring a generative model for distributed black box optimization. We experimentally evaluate STEP in two challenging and stochastic domains with large state and joint action spaces and show that STEP is able to learn stronger policies than standard multi-agent reinforcement learning algorithms, when combining multi-agent open-loop planning with centralized function approximation. The learned policies can be reintegrated into the multi-agent planning process to further improve performance.
Emergent Linguistic Phenomena in Multi-Agent Communication Games
Graesser, Laura, Cho, Kyunghyun, Kiela, Douwe
In this work, we propose a computational framework in which agents equipped with communication capabilities simultaneously play a series of referential games, where agents are trained using deep reinforcement learning. We demonstrate that the framework mirrors linguistic phenomena observed in natural language: i) the outcome of contact between communities is a function of inter- and intra-group connectivity; ii) linguistic contact either converges to the majority protocol, or in balanced cases leads to novel creole languages of lower complexity; and iii) a linguistic continuum emerges where neighboring languages are more mutually intelligible than farther removed languages. We conclude that intricate properties of language evolution need not depend on complex evolved linguistic capabilities, but can emerge from simple social exchanges between perceptually-enabled agents playing communication games.
On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games
Mazumdar, Eric V., Jordan, Michael I., Sastry, S. Shankar
We propose local symplectic surgery, a two-timescale procedure for finding local Nash equilibria in two-player zero-sum games. We first show that previous gradient-based algorithms cannot guarantee convergence to local Nash equilibria due to the existence of non-Nash stationary points. By taking advantage of the differential structure of the game, we construct an algorithm for which the local Nash equilibria are the only attracting fixed points. We also show that the algorithm exhibits no oscillatory behaviors in neighborhoods of equilibria and show that it has the same per-iteration complexity as other recently proposed algorithms. We conclude by validating the algorithm on two numerical examples: a toy example with multiple Nash equilibria and a non-Nash equilibrium, and the training of a small generative adversarial network (GAN).
Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning
Ahilan, Sanjeevan, Dayan, Peter
We investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from human societies, in which successful coordination of many individuals is often facilitated by hierarchical organisation, we introduce Feudal Multi-agent Hierarchies (FMH). In this framework, a 'manager' agent, which is tasked with maximising the environmentally-determined reward function, learns to communicate subgoals to multiple, simultaneously-operating, 'worker' agents. Workers, which are rewarded for achieving managerial subgoals, take concurrent actions in the world. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We find that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use a shared reward function.
The Multi-Agent Reinforcement Learning in Malm\"O (MARL\"O) Competition
Perez-Liebana, Diego, Hofmann, Katja, Mohanty, Sharada Prasanna, Kuno, Noburu, Kramer, Andre, Devlin, Sam, Gaina, Raluca D., Ionita, Daniel
Learning in multi-agent scenarios is a fruitful research direction, but current approaches still show scalability problems in multiple games with general reward settings and different opponent types. The Multi-Agent Reinforcement Learning in Malm\"O (MARL\"O) competition is a new challenge that proposes research in this domain using multiple 3D games. The goal of this contest is to foster research in general agents that can learn across different games and opponent types, proposing a challenge as a milestone in the direction of Artificial General Intelligence.
Thirty Years of Machine Learning:The Road to Pareto-Optimal Next-Generation Wireless Networks
Wang, Jingjing, Jiang, Chunxiao, Zhang, Haijun, Ren, Yong, Chen, Kwang-Cheng, Hanzo, Lajos
Next-generation wireless networks (NGWN) have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of machine learning by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning, respectively. Furthermore, we investigate their employment in the compelling applications of NGWNs, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various machine learning algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.
Open-ended Learning in Symmetric Zero-sum Games
Balduzzi, David, Garnelo, Marta, Bachrach, Yoram, Czarnecki, Wojciech M., Perolat, Julien, Jaderberg, Max, Graepel, Thore
Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.
Cooperative Online Learning: Keeping your Neighbors Updated
Cesa-Bianchi, Nicolò, Cesari, Tommaso R., Monteleoni, Claire
We introduce and analyze a cooperative online learning setting in which a network of agents solve a common online convex optimization problem by sharing feedback with their network neighbors. Agents do not have to be synchronized. At each time step, only some of the agents are requested to make a prediction and pay the corresponding loss: we call these agents "active". As the feedback (i.e., the current loss function) received by the active agents is communicated to their neighbors, both active agents and their neighbors can use the feedback to update their local models. Asynchronous online learning settings with communication constraints naturally arise in many applications. Forexample, large-scale learning systems are often geographically distributed, and in domains such as finance or online advertising, typically each agent must serve high volumes of prediction requests. If agents keep updating their local models in an online fashion, then bandwidth and computational constraints may force them to limit communication by sharing feedbacks only with their neighbors.
Hierarchical Reinforcement Learning for Multi-agent MOBA Game
Zhang, Zhijian, Li, Haozheng, Zhang, Luo, Zheng, Tianyin, Zhang, Ting, Hao, Xiong, Chen, Xiaoxin, Chen, Min, Xiao, Fangxu, Zhou, Wei
Although deep reinforcement learning has achieved great success recently, there are still challenges in Real Time Strategy (RTS) games. Due to its large state and action space, as well as hidden information, RTS games require macro strategies as well as micro level manipulation to obtain satisfactory performance. In this paper, we present a novel hierarchical reinforcement learning model for mastering Multiplayer Online Battle Arena (MOBA) games, a sub-genre of RTS games. In this hierarchical framework, agents make macro strategies by imitation learning and do micromanipulations through reinforcement learning. Moreover, we propose a simple self-learning method to get better sample efficiency for reinforcement part and extract some global features by multi-target detection method in the absence of game engine or API. In 1v1 mode, our agent successfully learns to combat and defeat built-in AI with 100\% win rate, and experiments show that our method can create a competitive multi-agent for a kind of mobile MOBA game King of Glory (KOG) in 5v5 mode.