Agent Societies
Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning
Feng, Jun, Li, Heng, Huang, Minlie, Liu, Shichen, Ou, Wenwu, Wang, Zhirong, Zhu, Xiaoyan
Ranking is a fundamental and widely studied problem in scenarios such as search, advertising, and recommendation. However, joint optimization for multi-scenario ranking, which aims to improve the overall performance of several ranking strategies in different scenarios, is rather untouched. Separately optimizing each individual strategy has two limitations. The first one is lack of collaboration between scenarios meaning that each strategy maximizes its own objective but ignores the goals of other strategies, leading to a sub-optimal overall performance. The second limitation is the inability of modeling the correlation between scenarios meaning that independent optimization in one scenario only uses its own user data but ignores the context in other scenarios. In this paper, we formulate multi-scenario ranking as a fully cooperative, partially observable, multi-agent sequential decision problem. We propose a novel model named Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG) which has a communication component for passing messages, several private actors (agents) for making actions for ranking, and a centralized critic for evaluating the overall performance of the co-working actors. Each scenario is treated as an agent (actor). Agents collaborate with each other by sharing a global action-value function (the critic) and passing messages that encodes historical information across scenarios. The model is evaluated with online settings on a large E-commerce platform. Results show that the proposed model exhibits significant improvements against baselines in terms of the overall performance.
CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning
Yang, Jiachen, Nakhaei, Alireza, Isele, David, Zha, Hongyuan, Fujimura, Kikuo
We propose CM3, a new deep reinforcement learning method for cooperative multi-agent problems where agents must coordinate for joint success in achieving different individual goals. We restructure multi-agent learning into a two-stage curriculum, consisting of a single-agent stage for learning to accomplish individual tasks, followed by a multi-agent stage for learning to cooperate in the presence of other agents. These two stages are bridged by modular augmentation of neural network policy and value functions. We further adapt the actor-critic framework to this curriculum by formulating local and global views of the policy gradient and learning via a double critic, consisting of a decentralized value function and a centralized action-value function. We evaluated CM3 on a new high-dimensional multi-agent environment with sparse rewards: negotiating lane changes among multiple autonomous vehicles in the Simulation of Urban Mobility (SUMO) traffic simulator. Detailed ablation experiments show the positive contribution of each component in CM3, and the overall synthesis converges significantly faster to higher performance policies than existing cooperative multi-agent methods.
Detecting Intentions of Vulnerable Road Users Based on Collective Intelligence
Bieshaar, Maarten, Reitberger, Gรผnther, Zernetsch, Stefan, Sick, Bernhard, Fuchs, Erich, Doll, Konrad
Vulnerable road users (VRUs, i.e. cyclists and pedestrians) will play an important role in future traffic. To avoid accidents and achieve a highly efficient traffic flow, it is important to detect VRUs and to predict their intentions. In this article a holistic approach for detecting intentions of VRUs by cooperative methods is presented. The intention detection consists of basic movement primitive prediction, e.g. standing, moving, turning, and a forecast of the future trajectory. Vehicles equipped with sensors, data processing systems and communication abilities, referred to as intelligent vehicles, acquire and maintain a local model of their surrounding traffic environment, e.g. crossing cyclists. Heterogeneous, open sets of agents (cooperating and interacting vehicles, infrastructure, e.g. cameras and laser scanners, and VRUs equipped with smart devices and body-worn sensors) exchange information forming a multi-modal sensor system with the goal to reliably and robustly detect VRUs and their intentions under consideration of real time requirements and uncertainties. The resulting model allows to extend the perceptual horizon of the individual agent beyond their own sensory capabilities, enabling a longer forecast horizon. Concealments, implausibilities and inconsistencies are resolved by the collective intelligence of cooperating agents. Novel techniques of signal processing and modelling in combination with analytical and learning based approaches of pattern and activity recognition are used for detection, as well as intention prediction of VRUs. Cooperation, by means of probabilistic sensor and knowledge fusion, takes place on the level of perception and intention recognition. Based on the requirements of the cooperative approach for the communication a new strategy for an ad hoc network is proposed.
Community Regularization of Visually-Grounded Dialog
Agarwal, Akshat, Gurumurthy, Swaminathan, Sharma, Vasu, Lewis, Mike, Sycara, Katia
The task of conducting visually grounded dialog involves learning goal-oriented cooperative dialog between autonomous agents who exchange information about a scene through several rounds of questions and answers in natural language. We posit that requiring artificial agents to adhere to the rules of human language, while also requiring them to maximize information exchange through dialog is an ill-posed problem. We observe that humans do not stray from a common language because they are social creatures who live in communities, and have to communicate with many people everyday, so it is far easier to stick to a common language even at the cost of some efficiency loss. Using this as inspiration, we propose and evaluate a multi-agent community-based dialog framework where each agent interacts with, and learns from, multiple agents, and show that this community-enforced regularization results in more relevant and coherent dialog (as judged by human evaluators) without sacrificing task performance (as judged by quantitative metrics).
A Roadmap for the Value-Loading Problem
We analyze the value-loading problem. This is the problem of encoding moral values into an AI agent interacting with a complex environment. Like many before, we argue that this is both a major concern and an extremely challenging problem. Solving it will likely require years, if not decades, of multidisciplinary work by teams of top scientists and experts. Given how uncertain the timeline of human-level AI research is, we thus argue that a pragmatic partial solution should be designed as soon as possible. To this end, we propose a preliminary research program. This roadmap identifies several key steps. We hope that this will allow scholars, engineers and decision-makers to better grasp the upcoming difficulties, and to foresee how they can best contribute to the global effort.
If we fight cyberattacks alone, we're doomed to fail Eugene Kaspersky
The safety of our online lives has become increasingly important. Whether it be interference in elections, attacks by hostile forces, or online fraud, the security of the web feels fragile. Cybersecurity has reached a crossroads and we need to decide where it goes next. The outcome will touch each of us โ will we pay more and yet still be less safe? Will we face higher insurance premiums and bank charges to cover the rising number of cyber-incidents?
Anonymous Hedonic Game for Task Allocation in a Large-Scale Multiple Agent System
Jang, Inmo, Shin, Hyo-Sang, Tsourdos, Antonios
Cooperation of a large number of possibly small-sized robots, called robotic swarm, will play a significant role in complex missions that existing operational concepts using a few large robots could not deal with [1]. Even if every single robot (or called agent) in a swarm is incapable of accomplishing a task alone, their cooperation will lead to successful outcomes [2]-[5]. The possible applications include environmental monitoring [6], ad-hoc network relay [7], disaster management [8], cooperative radar jamming [9], to name a few. Due to the large cardinality of a swarm robot system, however, it is infeasible for human operators to supervise each agent directly, but needed to entrust the swarm with certain levels of decision-makings (e.g., task allocation, path planning, and individual control). Thereby, what only remains is to provide a high-level mission description, which is manageable for a few or even a single human operator. Nevertheless, there still exist various challenges in the autonomous decisionmaking of robotic swarms. Among them, this paper addresses a task allocation problem where the number of agents is higher than that of tasks: how to partition a set of agents into subgroups and assign the subgroups to each task.
Deep Reinforcement Learning for Swarm Systems
Hรผttenrauch, Maximilian, ล oลกiฤ, Adrian, Neumann, Gerhard
Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.
How game complexity affects the playing behavior of synthetic agents
Kiourt, Chairi, Kalles, Dimitris, Kanellopoulos, Panagiotis
Agent based simulation of social organizations, via the investigation of agents' training and learning tactics and strategies, has been inspired by the ability of humans to learn from social environments which are rich in agents, interactions and partial or hidden information. Such richness is a source of complexity that an effective learner has to be able to navigate. This paper focuses on the investigation of the impact of the environmental complexity on the game playing-and-learning behavior of synthetic agents. We demonstrate our approach using two independent turn-based zero-sum games as the basis of forming social events which are characterized both by competition and cooperation. The paper's key highlight is that as the complexity of a social environment changes, an effective player has to adapt its learning and playing profile to maintain a given performance profile
Capture the Flag: the emergence of complex cooperative agents DeepMind
Above: four of our trained agents play together on an indoor and outdoor procedurally generated Capture the Flag level. Billions of people inhabit the planet, each with their own individual goals and actions, but still capable of coming together through teams, organisations and societies in impressive displays of collective intelligence. This is a setting we call multi-agent learning: many individual agents must act independently, yet learn to interact and cooperate with other agents. This is an immensely difficult problem - because with co-adapting agents the world is constantly changing. To investigate this problem we look at 3D first-person multiplayer video games.