Agents
Inequity aversion improves cooperation in intertemporal social dilemmas
Hughes, Edward, Leibo, Joel Z., Phillips, Matthew G., Tuyls, Karl, Duéñez-Guzmán, Edgar A., Castañeda, Antonio García, Dunning, Iain, Zhu, Tina, McKee, Kevin R., Koster, Raphael, Roff, Heather, Graepel, Thore
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
A Way to Facilitate Decision Making in a Mixed Group of Manned and Unmanned Aerial Vehicles
Maximov, Dmitry, Legovich, Yury, Goncharenko, Vladimir
A mixed group of manned and unmanned aerial vehicles is considered as a distributed system. A lattice of tasks which may be fulfilled by the system matches to it. An external multiplication operation is defined at the lattice, which defines correspondingly linear logic operations. Linear implication and tensor product are used to choose a system reconfiguration variant, i.e., to determine a new task executor choice. The task lattice structure (i.e., the system purpose) and the operation definitions largely define the choice. Thus, the choice is mainly the system purpose consequence. The suggested method is illustrated using an example of a mixed group control at forest fire compression. Keywords Multi-Agent Systems · Decision making · Mixed Group · Goal Lattice · Linear logic 1 Introduction At present, aviation surveillance systems in the emergency zone have received wide distribution [1]. Lately, unmanned aerial vehicles (UAV) are actively used in these surveillance systems.
NVM at DF18: AI trends to watch
Technology in today's Age of the Customer is simultaneously increasing customer expectations and making service more complex. In the last few years, smarter algorithms, artificial intelligence (AI), self-service channels and analytics have exploded, and 56% of global consumers say they have higher expectations for customer service now than they had just one year ago. This wave of innovation is also bringing exciting opportunities for service managers to transform their brand's customer experience. Bluewolf, an IBM Company, predicted that AI will impact customer service in four key areas in 2018. Guiding -- Predictive and machine learning models to instruct next best action with the customer.
Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas
Anastassacos, Nicolas, Musolesi, Mirco
Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter η to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma, LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the iterated prisoner's dilemma.
3D Pursuit-Evasion for AUVs
Özkahraman, Özer, Ögren, Petter
Abstract-- In this paper, we consider the problem of pursuit-evasion using multiple Autonomous Underwater Vehicles (AUVs) in a 3D water volume, with and without simple obstacles. Pursuit-evasion is a well studied topic in robotics, but the results are mostly set in 2D environments, using unlimited line of sight sensing. We propose an algorithm for range limited sensing in 3D environments that captures a finite speed evader based on one single previous observation of its location. The pursuers are first moved to form a maximal cage formation, based on their number and sensor ranges, containing all of the possible evader locations. The cage is then shrunk until every part of that volume is sensed, thereby capturing the evader. The pursuers need only limited sensing range and low bandwidth communication, making the algorithm well suited for an underwater environment. I. INTRODUCTION Pursuit-evasion is a game played between two opposing sides, the pursuer(s) and evader(s).
Hierarchical Deep Multiagent Reinforcement Learning
Tang, Hongyao, Hao, Jianye, Lv, Tangjie, Chen, Yingfeng, Zhang, Zongzhang, Jia, Hangtian, Ren, Chunxu, Zheng, Yan, Fan, Changjie, Wang, Li
Despite deep reinforcement learning has recently achieved great successes, however in multiagent environments, a number of challenges still remain. Multiagent reinforcement learning (MARL) is commonly considered to suffer from the problem of non-stationary environments and exponentially increasing policy space. It would be even more challenging to learn effective policies in circumstances where the rewards are sparse and delayed over long trajectories. In this paper, we study Hierarchical Deep Multiagent Reinforcement Learning (hierarchical deep MARL) in cooperative multiagent problems with sparse and delayed rewards, where efficient multiagent learning methods are desperately needed. We decompose the original MARL problem into hierarchies and investigate how effective policies can be learned hierarchically in synchronous/asynchronous hierarchical MARL frameworks. Several hierarchical deep MARL architectures, i.e., Ind-hDQN, hCom and hQmix, are introduced for different learning paradigms. Moreover, to alleviate the issues of sparse experiences in high-level learning and non-stationarity in multiagent settings, we propose a new experience replay mechanism, named as Augmented Concurrent Experience Replay (ACER). We empirically demonstrate the effects and efficiency of our approaches in several classic Multiagent Trash Collection tasks, as well as in an extremely challenging team sports game, i.e., Fever Basketball Defense.
Evolving Agents for the Hanabi 2018 CIG Competition
Canaan, Rodrigo, Shen, Haotian, Torrado, Ruben Rodriguez, Togelius, Julian, Nealen, Andy, Menzel, Stefan
Abstract--Hanabi is a cooperative card game with hidden information that has won important awards in the industry and received some recent academic attention. A two-track competition of agents for the game will take place in the 2018 CIG conference. In this paper, we develop a genetic algorithm that builds rulebased agents by determining the best sequence of rules from a fixed rule set to use as strategy. In three separate experiments, we remove human assumptions regarding the ordering of rules, add new, more expressive rules to the rule set and independently evolve agents specialized at specific game sizes. As result, we achieve scores superior to previously published research for the mirror and mixed evaluation of agents. Game-playing agents have a long tradition of serving as benchmarks for AI research. However, traditionally most of the focus has been on competitive, perfect information games, such as Checkers [1], Chess [2] and Go [3]. Cooperative games with imperfect information provide an interesting research topic not only due to the added challenges posed to researchers, but also because many modern industrial and commercial applications can be characterized as examples of cooperation between humans and machines in order to achieve a mutual goal in an uncertain environment. In this paper, we address a particularly interesting cooperative game with partial information: Hanabi [4].
Towards Game-based Metrics for Computational Co-creativity
Canaan, Rodrigo, Menzel, Stefan, Togelius, Julian, Nealen, Andy
Abstract--We propose the following question: what gamelike interactive system would provide a good environment for measuring the impact and success of a co-creative, cooperative agent? Creativity is often formulated in terms of novelty, value, surprise and interestingness. We review how these concepts are measured in current computational intelligence research and provide a mapping from modern electronic and tabletop games to open research problems in mixed-initiative systems and computational co-creativity. We propose application scenarios for future research, and a number of metrics under which the performance of cooperative agents in these environments will be evaluated. I. INTRODUCTION Designing intelligent agents characterized by a co-creative, cooperative behavior would mark a major breakthrough in the age of industrial man-machine interaction. Exchanging relevant information with suitable time frequency and enriching the partner (human or machine) with novel perspectives and solution strategies on the problem are key factors for desirable results (considering the value of the output and the effort required). Cooperative games offer the valuable opportunity to realize an interactive environment for developing and evaluating computational methods used by these agents. In this paper we review concepts and implementations of cooperative games in the light of their capability to impact development processes in (industrial) environments with co-evolution and co-creativity as important expressions for cooperation. Having a working definition of computational creativity, and how creative systems and their outputs are judged in terms of their value, novelty, interestingness, and surprise, will help us understand cooperatively creative agents and might help us build them as well. Computational creativity and AIassisted design are important application areas for computational intelligence techniques such as neural networks, reinforcement learning and evolutionary computation; further, the conceptualization of creativity as search in a design space fits well with design applications of evolutionary computation.
Interactions as Social Practices: towards a formalization
Multi-agent models are a suitable starting point to model complex social interactions. However, as the complexity of the systems increase, we argue that novel modeling approaches are needed that can deal with inter-dependencies at different levels of society, where many heterogeneous parties (software agents, robots, humans) are interacting and reacting to each other. In this paper, we present a formalization of a social framework for agents based in the concept of Social Practices as high level specifications of normal (expected) behavior in a given social context. We argue that social practices facilitate the practical reasoning of agents in standard social interactions.
Complexity of Shift Bribery in Committee Elections
Bredereck, Robert, Faliszewski, Piotr, Niedermeier, Rolf, Talmon, Nimrod
Given an election, a preferred candidate p, and a budget, the SHIFT BRIBERY problem asks whether p can win the election after shifting p higher in some voters' preference orders. Of course, shifting comes at a price (depending on the voter and on the extent of the shift) and one must not exceed the given budget. We study the (parameterized) computational complexity of S HIFT BRIBERY for multiwinner voting rules where winning the election means to be part of some winning committee. We focus on the well-established SNTV, Bloc, k-Borda, and Chamberlin-Courant rules, as well as on approximate variants of the Chamberlin-Courant rule, since the original rule is NP-hard to compute. We show that SHIFT BRIBERY tends to be harder in the multiwinner setting than in the single-winner one by showing settings where SHIFT BRIBERY is easy in the single-winner cases, but is hard (and hard to approximate) in the multiwinner ones. Moreover, we show that the non-monotonicity of those rules which are based on approximation algorithms for the Chamberlin-Courant rule sometimes affects the complexity of SHIFT BRIBERY.