Goto

Collaborating Authors

 Agent Societies


Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients

arXiv.org Artificial Intelligence

In this paper, we explore using deep reinforcement learning for problems with multiple agents. Most existing methods for deep multi-agent reinforcement learning consider only a small number of agents. When the number of agents increases, the dimensionality of the input and control spaces increase as well, and these methods do not scale well. To address this, we propose casting the multi-agent reinforcement learning problem as a distributed optimization problem. Our algorithm assumes that for multi-agent settings, policies of individual agents in a given population live close to each other in parameter space and can be approximated by a single policy. With this simple assumption, we show our algorithm to be extremely effective for reinforcement learning in multi-agent settings. We demonstrate its effectiveness against existing comparable approaches on co-operative and competitive tasks.


A Study of AI Population Dynamics with Million-agent Reinforcement Learning

arXiv.org Artificial Intelligence

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning. Our intention is to put intelligent agents into a simulated natural context and verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning (DRL). In order to scale the population size up to millions agents, a large-scale DRL training platform with redesigned experience buffer is proposed. Our results show that the population dynamics of AI agents, driven only by each agent's individual self-interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.


Multi-Agent Path Finding with Deadlines: Preliminary Results

arXiv.org Artificial Intelligence

We formalize the problem of multi-agent path finding with deadlines (MAPF-DL). The objective is to maximize the number of agents that can reach their given goal vertices from their given start vertices within a given deadline, without colliding with each other. We first show that the MAPF-DL problem is NP-hard to solve optimally. We then present an optimal MAPF-DL algorithm based on a reduction of the MAPF-DL problem to a flow problem and a subsequent compact integer linear programming formulation of the resulting reduced abstracted multi-commodity flow network.


An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems

arXiv.org Artificial Intelligence

Multiagent coordination in cooperative multiagent systems (MASs) has been widely studied in both fixed-agent repeated interaction setting and the static social learning framework. However, two aspects of dynamics in real-world multiagent scenarios are currently missing in existing works. First, the network topologies can be dynamic where agents may change their connections through rewiring during the course of interactions. Second, the game matrix between each pair of agents may not be static and usually not known as a prior. Both the network dynamic and game uncertainty increase the coordination difficulty among agents. In this paper, we consider a multiagent dynamic social learning environment in which each agent can choose to rewire potential partners and interact with randomly chosen neighbors in each round. We propose an optimal rewiring strategy for agents to select most beneficial peers to interact with for the purpose of maximizing the accumulated payoff in repeated interactions. We empirically demonstrate the effectiveness and robustness of our approach through comparing with benchmark strategies. The performance of three representative learning strategies under our social learning framework with our optimal rewiring is investigated as well.


DDoS-for-Hire website taken down in global collaboration of law enforcement agencies

#artificialintelligence

Webstresser.org, a popular DDoS-for-Hire website service on Wednesday was taken down by authorities from the US, UK, Netherlands, and various other countries in a major international investigation and arrests have been made. The website is blamed for more than four million cyber attacks globally in the past three years and had over 134,000 registered users at the time of the takedown. The operation, dubbed "Operation Power OFF," targeted Webstresser.org, It involved law enforcement agencies from the Netherlands, United Kingdom, Serbia, Croatia, Spain, Italy, Germany, Australia, Hongkong, Canada, and United States of America, coordinating with Europol. The domain name was seized by the US Department of Defence.


Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Robots that navigate among pedestrians use collision avoidance algorithms to enable safe and efficient operation. Recent works present deep reinforcement learning as a framework to model the complex interactions and cooperation. However, they are implemented using key assumptions about other agents' behavior that deviate from reality as the number of agents in the environment increases. This work extends our previous approach to develop an algorithm that learns collision avoidance among a variety of types of dynamic agents without assuming they follow any particular behavior rules. This work also introduces a strategy using LSTM that enables the algorithm to use observations of an arbitrary number of other agents, instead of previous methods that have a fixed observation size. The proposed algorithm outperforms our previous approach in simulation as the number of agents increases, and the algorithm is demonstrated on a fully autonomous robotic vehicle traveling at human walking speed, without the use of a 3D Lidar.


A Logic of Agent Organizations

arXiv.org Artificial Intelligence

Organization concepts and models are increasingly being adopted for the design and specification of multi-agent systems. Agent organizations can be seen as mechanisms of social order, created to achieve global (or organizational) objectives by more or less autonomous agents. In order to develop a theory on the relation between organizational structures, organizational objectives and the actions of agents fulfilling roles in the organization a theoretical framework is needed to describe organizational structures and actions of (groups of) agents. Current logical formalisms focus on specific aspects of organizations (e.g. power, delegation, agent actions, or normative issues) but a framework that integrates and relates different aspects is missing. Given the amount of aspects involved and the subsequent complexity of a formalism encompassing them all, it is difficult to realize. In this paper, a first step is taken to solve this problem. We present a generic formal model that enables to specify and relate the main concepts of an organization (including, activity, structure, environment and others) so that organizations can be analyzed at a high level of abstraction. However, for some aspects we use a simplified model in order to avoid the complexity of combining many different types of (modal) operators.


Distributed Constraint Optimization Problems and Applications: A Survey

Journal of Artificial Intelligence Research

The field of multi-agent system (MAS) is an active area of research within artificial intelligence, with an increasingly important impact in industrial and other real-world applications. In a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as a prominent agent model to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have been proposed to enable support of MAS in complex, real-time, and uncertain environments. This survey provides an overview of the DCOP model, offering a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.


LfD Training of Heterogeneous Formation Behaviors

AAAI Conferences

Problem domains such as disaster relief, search and rescue, and games can benefit from having a human quickly train coordinated behaviors for a diverse set of agents. Hierarchical Training of Agent Behaviors (HiTAB) is a Learning from Demonstration (LfD) approach that addresses some inherent complexities in multiagent learning, making it possible to train complex heterogeneous behaviors from a small set of training samples. In this paper, we successfully demonstrate LfD training of formation behaviors using a small set of agents that, without retraining, continue to operate correctly when additional agents are available. We selected training of formations for the experiments because formations: require a great deal of coordination between agents, are heterogenous due to the differing roles of participating agents, and can scale as the number of agents grows. We also introduce some extensions to HiTAB that facilitate this type of training.


Valuing knowledge, information and agency in Multi-agent Reinforcement Learning: a case study in smart buildings

arXiv.org Machine Learning

Increasing energy efficiency in buildings can reduce costs and emissions substantially. Historically, this has been treated as a local, or single-agent, optimization problem. However, many buildings utilize the same types of thermal equipment e.g. electric heaters and hot water vessels. During operation, occupants in these buildings interact with the equipment differently thereby driving them to diverse regions in the state-space. Reinforcement learning agents can learn from these interactions, recorded as sensor data, to optimize the overall energy efficiency. However, if these agents operate individually at a household level, they can not exploit the replicated structure in the problem. In this paper, we demonstrate that this problem can indeed benefit from multi-agent collaboration by making use of targeted exploration of the state-space allowing for better generalization. We also investigate trade-offs between integrating human knowledge and additional sensors. Results show that savings of over 40% are possible with collaborative multi-agent systems making use of either expert knowledge or additional sensors with no loss of occupant comfort. We find that such multi-agent systems comfortably outperform comparable single agent systems.