Goto

Collaborating Authors

 Agent Societies


Naomi Ehrich Leonard: Bio-inspired dynamics for multi-agent decision-making CMU RI Seminar

Robohub

Abstract: "I will present distributed decision-making dynamics for multi-agent systems, motivated by studies of animal groups, such as house-hunting honeybees, and their extraordinary ability to make collective decisions that are both robust to disturbance and adaptable to change. The dynamics derive from principles of symmetry, consensus, and bifurcation in networked systems, exploiting instability as a means to flexibly transition from one stable solution to another. Feedback dynamics are derived for the bifurcation control, a variable representing social effort, such that flexible transition is made a controlled adaptive response."


Opinion: AI development needs global cooperation, not China-phobia - Xinhua

#artificialintelligence

Sophia, a life-like humanoid robot, is pictured at the UN headquarters in New York, Oct. 11, 2017. Sophia was here attending a meeting on "The Future of Everything - Sustainable Development in the Age of Rapid Technological Change". WASHINGTON, March 1 (Xinhua) -- China is gaining momentum in the artificial intelligence (AI) industry, which has been translating its huge market size into commercialized innovations. This is a boon instead of a threat. The cry-wolf alarms that America is losing a race for supremacy in the AI industry by comparing China's catching-up to America's Sputnik panic in the late 1950s, have, in a sense, misinterpreted or misrepresented the true AI story. A typical misinterpretation goes to the "global tech cold war," which was put forward by Eurasia Group, a New York-headquartered think tank, arguing that the winner in AI and super-computing between the United States and China will dominate the coming decades, both economically and geopolitically.


Lenient Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.


Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising

arXiv.org Artificial Intelligence

Real-time advertising allows advertisers to bid for each impression for a visiting user. To optimize a specific goal such as maximizing the revenue led by ad placements, advertisers not only need to estimate the relevance between the ads and user's interests, but most importantly require a strategic response with respect to other advertisers bidding in the market. In this paper, we formulate bidding optimization with multi-agent reinforcement learning. To deal with a large number of advertisers, we propose a clustering method and assign each cluster with a strategic bidding agent. A practical Distributed Coordinated Multi-Agent Bidding (DCMAB) has been proposed and implemented to balance the tradeoff between the competition and cooperation among advertisers. The empirical study on our industry-scaled real-world data has demonstrated the effectiveness of our modeling methods. Our results show that a cluster based bidding would largely outperform single-agent and bandit approaches, and the coordinated bidding achieves better overall objectives than the purely self-interested bidding agents.


Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

arXiv.org Machine Learning

We consider the problem of \emph{fully decentralized} multi-agent reinforcement learning (MARL), where the agents are located at the nodes of a time-varying communication network. Specifically, we assume that the reward functions of the agents might correspond to different tasks, and are only known to the corresponding agent. Moreover, each agent makes individual decisions based on both the information observed locally and the messages received from its neighbors over the network. Within this setting, the collective goal of the agents is to maximize the globally averaged return over the network through exchanging information with their neighbors. To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large. Under the decentralized structure, the actor step is performed individually by each agent with no need to infer the policies of others. For the critic step, we propose a consensus update via communication over the network. Our algorithms are fully incremental and can be implemented in an online fashion. Convergence analyses of the algorithms are provided when the value functions are approximated within the class of linear functions. Extensive simulation results with both linear and nonlinear function approximations are presented to validate the proposed algorithms. Our work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees.


A Multi-Disciplinary Review of Knowledge Acquisition Methods: From Human to Autonomous Eliciting Agents

arXiv.org Artificial Intelligence

This paper offers a multi-disciplinary review of knowledge acquisition methods in human activity systems. The review captures the degree of involvement of various types of agencies in the knowledge acquisition process, and proposes a classification with three categories of methods: the human agent, the human-inspired agent, and the autonomous machine agent methods. In the first two categories, the acquisition of knowledge is seen as a cognitive task analysis exercise, while in the third category knowledge acquisition is treated as an autonomous knowledge-discovery endeavour. The motivation for this classification stems from the continuous change over time of the structure, meaning and purpose of human activity systems, which are seen as the factor that fuelled researchers' and practitioners' efforts in knowledge acquisition for more than a century. We show through this review that the KA field is increasingly active due to the higher and higher pace of change in human activity, and conclude by discussing the emergence of a fourth category of knowledge acquisition methods, which are based on red-teaming and co-evolution.


Learning to Gather without Communication

arXiv.org Machine Learning

A standard belief on emerging collective behavior is that it emerges from simple individual rules. Most of the mathematical research on such collective behavior starts from imperative individual rules, like always go to the center. But how could an (optimal) individual rule emerge during a short period within the group lifetime, especially if communication is not available. We argue that such rules can actually emerge in a group in a short span of time via collective (multi-agent) reinforcement learning, i.e learning via rewards and punishments. We consider the gathering problem: several agents (social animals, swarming robots...) must gather around a same position, which is not determined in advance. They must do so without communication on their planned decision, just by looking at the position of other agents. We present the first experimental evidence that a gathering behavior can be learned without communication in a partially observable environment. The learned behavior has the same properties as a self-stabilizing distributed algorithm, as processes can gather from any initial state (and thus tolerate any transient failure). Besides, we show that it is possible to tolerate the brutal loss of up to 90\% of agents without significant impact on the behavior.


Personalized and Private Peer-to-Peer Machine Learning

arXiv.org Machine Learning

The rise of connected personal devices together with privacy concerns call for machine learning algorithms capable of leveraging the data of a large number of agents to learn personalized models under strong privacy requirements. In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable convergence rate. We show how to make the algorithm differentially private to protect against the disclosure of information about the personal datasets, and formally analyze the trade-off between utility and privacy. Our experiments show that our approach dramatically outperforms previous work in the non-private case, and that under privacy constraints, we can significantly improve over models learned in isolation.


Mean Field Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of user interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution. Experiments on resource allocation, Ising model estimation, and battle game tasks verify the learning effectiveness of our mean field approaches in handling many-agent interactions in population.


Engineering Pro-Sociality With Autonomous Agents

AAAI Conferences

This paper envisions a future where autonomous agents are used to foster and support pro-social behavior in a hybrid society of humans and machines. Pro-social behavior occurs when people and agents perform costly actions that benefit others. Acts such as helping others voluntarily, donating to charity, providing informations or sharing resources, are all forms of pro-social behavior. We discuss two questions that challenge a purely utilitarian view of human decision making and contextualize its role in hybrid societies: i) What are the conditions and mechanisms that lead societies of agents and humans to be more pro-social? ii) How can we engineer autonomous entities (agents and robots) that lead to more altruistic and cooperative behaviors in a hybrid society? We propose using social simulations, game theory, population dynamics, and studies with people in virtual or real environments (with robots) where both agents and humans interact. This research will constitute the basis for establishing the foundations for the new field of Pro-social Computing, aiming at understanding, predicting and promoting pro-sociality among humans, through artificial agents and multiagent systems.