Agent Societies
An Optimal Itinerary Generation in a Configuration Space of Large Intellectual Agent Groups with Linear Logic
-- a group of intelligent agents which fulfill a set of tasks in parallel is represented first by the tensor multiplication of corresponding processes in a linear logic game category. An optimal itinerary in the configuration space of the group states is defined as a play with maximal total reward in the category. New moments also are: the reward is represented as a degree of certainty (visibility) of an agent goal, and the system goals are chosen by the greatest value corresponding to these processes in the system goal lattice. The artificial intelligence is represented in the Artificial General Intelligence (AGI) approach as an information processor which consumes and gives out information. Investigations in the field are focused on systems which act rationally. A formal description of the most intelligent agent (AIXI) behavior, in the sense of some intelligence measure, is suggested in AGI framework [1].
Multi-Agent Common Knowledge Reinforcement Learning
Foerster, Jakob N., de Witt, Christian A. Schroeder, Farquhar, Gregory, Torr, Philip H. S., Boehmer, Wendelin, Whiteson, Shimon
In multi-agent reinforcement learning, centralised policies can only be executed if agents have access to either the global state or an instantaneous communication channel. An alternative approach that circumvents this limitation is to use centralised training of a set of decentralised policies. However, such policies severely limit the agents' ability to coordinate. We propose multi-agent common knowledge reinforcement learning (MACKRL), which strikes a middle ground between these two extremes. Our approach is based on the insight that, even in partially observable settings, subsets of agents often have some common knowledge that they can exploit to coordinate their behaviour. Common knowledge can arise, e.g., if all agents can reliably observe things in their own field of view and know the field of view of other agents. Using this additional information, it is possible to find a centralised policy that conditions only on agents' common knowledge and that can be executed in a decentralised fashion. A resulting challenge is then to determine at what level agents should coordinate. While the common knowledge shared among all agents may not contain much valuable information, there may be subgroups of agents that share common knowledge useful for coordination. MACKRL addresses this challenge using a hierarchical approach: at each level, a controller can either select a joint action for the agents in a given subgroup, or propose a partition of the agents into smaller subgroups whose actions are then selected by controllers at the next level. While action selection involves sampling hierarchically, learning updates are based on the probability of the joint action, calculated by marginalising across the possible decisions of the hierarchy. We show promising results on both a proof-of-concept matrix game and a multi-agent version of StarCraft II Micromanagement.
Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning
Lin, Kaixiang, Zhao, Renyu, Xu, Zhe, Zhou, Jiayu
Large-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not only can significantly improve the utilization of transportation resources but also increase the revenue and customer satisfaction. It is a challenging task to design an effective fleet management strategy that can adapt to an environment involving complex dynamics between demand and supply. Existing studies usually work on a simplified problem setting that can hardly capture the complicated stochastic demand-supply variations in high-dimensional space. In this paper we propose to tackle the large-scale fleet management problem using reinforcement learning, and propose a contextual multi-agent reinforcement learning framework including two concrete algorithms, namely contextual deep Q-learning and contextual multi-agent actor-critic, to achieve explicit coordination among a large number of agents adaptive to different contexts. We show significant improvements of the proposed framework over state-of-the-art approaches through extensive empirical studies.
TarMAC: Targeted Multi-Agent Communication
Das, Abhishek, Gervet, Théophile, Romoff, Joshua, Batra, Dhruv, Parikh, Devi, Rabbat, Michael, Pineau, Joelle
We explore a collaborative multi-agent reinforcement learning setting where a team of agents attempts to solve cooperative tasks in partially-observable environments. In this scenario, learning an effective communication protocol is key. We propose a communication architecture that allows for targeted communication, where agents learn both what messages to send and who to send them to, solely from downstream task-specific reward without any communication supervision. Additionally, we introduce a multi-stage communication approach where the agents co-ordinate via multiple rounds of communication before taking actions in the environment. We evaluate our approach on a diverse set of cooperative multi-agent tasks, of varying difficulties, with varying number of agents, in a variety of environments ranging from 2D grid layouts of shapes and simulated traffic junctions to complex 3D indoor environments. We demonstrate the benefits of targeted as well as multi-stage communication. Moreover, we show that the targeted communication strategies learned by agents are both interpretable and intuitive.
Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation
Jiang, Jiechuan, Dun, Chen, Lu, Zongqing
Learning to cooperate is crucially important in multi-agent reinforcement learning. The key is to take the influence of other agents into consideration when performing distributed decision making. However, multi-agent environment is highly dynamic, which makes it hard to learn abstract representations of influences between agents by only low-order features that existing methods exploit. In this paper, we propose a graph convolutional model for multi-agent cooperation. The graph convolution architecture adapts to the dynamics of the underlying graph of the multi-agent environment, where the influence among agents is captured by their abstract relation representations. High-order features extracted by relation kernels of convolutional layers from gradually increased receptive fields are exploited to learn cooperative strategies. The gradient of an agent not only backpropagates to itself but also to other agents in its receptive fields to reinforce the learned cooperative strategies. Moreover, the relation representations are temporally regularized to make the cooperation more consistent. Empirically, we show that our model enables agents to develop more cooperative and sophisticated strategies than existing methods in jungle and battle games and routing in packet switching networks.
Planification par fusions incr\'ementales de graphes
Pellier, Damien, Belaidi, lias.
In this paper, we introduce a generic and fresh model for distributed planning called "Distributed Planning Through Graph Merging" ({\sf DPGM}). This model unifies the different steps of the distributed planning process into a single step. Our approach is based on a planning graph structure for the agent reasoning and a CSP mechanism for the individual plan extraction and the coordination. We assume that no agent can reach the global goal alone. Therefore the agents must cooperate, {\it i.e.,} take in into account potential positive interactions between their activities to reach their common shared goal. The originality of our model consists in considering as soon as possible, {\it i.e.,} in the individual planning process, the positive and the negative interactions between agents activities in order to reduce the search cost of a global coordinated solution plan.
Is multiagent deep reinforcement learning the answer or the question? A brief survey
Hernandez-Leal, Pablo, Kartal, Bilal, Taylor, Matthew E.
Deep reinforcement learning (DRL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. In this context, first, this article provides a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Second, it provides guidelines to complement this emerging area by (i) showcasing examples on how methods and algorithms from DRL and multiagent learning (MAL) have helped solve problems in MDRL and (ii) providing general lessons learned from these works. We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists in both areas (DRL and MAL) in a joint effort to promote fruitful research in the multiagent community.
Inequity aversion improves cooperation in intertemporal social dilemmas
Hughes, Edward, Leibo, Joel Z., Phillips, Matthew G., Tuyls, Karl, Duéñez-Guzmán, Edgar A., Castañeda, Antonio García, Dunning, Iain, Zhu, Tina, McKee, Kevin R., Koster, Raphael, Roff, Heather, Graepel, Thore
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas
Anastassacos, Nicolas, Musolesi, Mirco
Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter η to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma, LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the iterated prisoner's dilemma.
Towards Game-based Metrics for Computational Co-creativity
Canaan, Rodrigo, Menzel, Stefan, Togelius, Julian, Nealen, Andy
Abstract--We propose the following question: what gamelike interactive system would provide a good environment for measuring the impact and success of a co-creative, cooperative agent? Creativity is often formulated in terms of novelty, value, surprise and interestingness. We review how these concepts are measured in current computational intelligence research and provide a mapping from modern electronic and tabletop games to open research problems in mixed-initiative systems and computational co-creativity. We propose application scenarios for future research, and a number of metrics under which the performance of cooperative agents in these environments will be evaluated. I. INTRODUCTION Designing intelligent agents characterized by a co-creative, cooperative behavior would mark a major breakthrough in the age of industrial man-machine interaction. Exchanging relevant information with suitable time frequency and enriching the partner (human or machine) with novel perspectives and solution strategies on the problem are key factors for desirable results (considering the value of the output and the effort required). Cooperative games offer the valuable opportunity to realize an interactive environment for developing and evaluating computational methods used by these agents. In this paper we review concepts and implementations of cooperative games in the light of their capability to impact development processes in (industrial) environments with co-evolution and co-creativity as important expressions for cooperation. Having a working definition of computational creativity, and how creative systems and their outputs are judged in terms of their value, novelty, interestingness, and surprise, will help us understand cooperatively creative agents and might help us build them as well. Computational creativity and AIassisted design are important application areas for computational intelligence techniques such as neural networks, reinforcement learning and evolutionary computation; further, the conceptualization of creativity as search in a design space fits well with design applications of evolutionary computation.