Agent Societies
Credit Assignment For Collective Multiagent RL With Global Rewards
Nguyen, Duc Thien, Kumar, Akshat, Lau, Hoong Chuin
Scaling decision theoretic planning to large multiagent systems is challenging due to uncertainty and partial observability in the environment. We focus on a multiagent planning model subclass, relevant to urban settings, where agent interactions are dependent on their ``collective influence'' on each other, rather than their identities. Unlike previous work, we address a general setting where system reward is not decomposable among agents. We develop collective actor-critic RL approaches for this setting, and address the problem of multiagent credit assignment, and computing low variance policy gradient estimates that result in faster convergence to high quality solutions. We also develop difference rewards based credit assignment methods for the collective setting. Empirically our new approaches provide significantly better solutions than previous methods in the presence of global rewards on two real world problems modeling taxi fleet optimization and multiagent patrolling, and a synthetic grid navigation domain.
Learning Attentional Communication for Multi-Agent Cooperation
Communication could potentially be an effective way for multi-agent cooperation. However, information sharing among all agents or in predefined communication architectures that existing methods adopt can be problematic. When there is a large number of agents, agents cannot differentiate valuable information that helps cooperative decision making from globally shared information. Therefore, communication barely helps, and could even impair the learning of multi-agent cooperation. Predefined communication architectures, on the other hand, restrict communication among agents and thus restrain potential cooperation. To tackle these difficulties, in this paper, we propose an attentional communication model that learns when communication is needed and how to integrate shared information for cooperative decision making. Our model leads to efficient and effective communication for large-scale multi-agent cooperation. Empirically, we show the strength of our model in a variety of cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies than existing methods.
Inequity aversion improves cooperation in intertemporal social dilemmas
Hughes, Edward, Leibo, Joel Z., Phillips, Matthew, Tuyls, Karl, Dueรฑez-Guzman, Edgar, Castaรฑeda, Antonio Garcรญa, Dunning, Iain, Zhu, Tina, McKee, Kevin, Koster, Raphael, Roff, Heather, Graepel, Thore
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
Learning Attentional Communication for Multi-Agent Cooperation
Communication could potentially be an effective way for multi-agent cooperation. However, information sharing among all agents or in predefined communication architectures that existing methods adopt can be problematic. When there is a large number of agents, agents cannot differentiate valuable information that helps cooperative decision making from globally shared information. Therefore, communication barely helps, and could even impair the learning of multi-agent cooperation. Predefined communication architectures, on the other hand, restrict communication among agents and thus restrain potential cooperation. To tackle these difficulties, in this paper, we propose an attentional communication model that learns when communication is needed and how to integrate shared information for cooperative decision making. Our model leads to efficient and effective communication for large-scale multi-agent cooperation. Empirically, we show the strength of our model in a variety of cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies than existing methods.
A Summary of Adaptation of Techniques from Search-based Optimal Multi-Agent Path Finding Solvers to Compilation-based Approach
In the multi-agent path finding problem (MAPF) we are given a set of agents each with respective start and goal positions. The task is to find paths for all agents while avoiding collisions aiming to minimize an objective function. Two such common objective functions is the sum-of-costs and the makespan. Many optimal solvers were introduced in the past decade - two prominent categories of solvers can be distinguished: search-based solvers and compilation-based solvers. Search-based solvers were developed and tested for the sum-of-costs objective while the most prominent compilation-based solvers that are built around Boolean satisfiability (SAT) were designed for the makespan objective. Very little was known on the performance and relevance of the compilation-based approach on the sum-of-costs objective. In this paper we show how to close the gap between these cost functions in the compilation-based approach. Moreover we study applicability of various techniques developed for search-based solvers in the compilation-based approach. A part of this paper introduces a SAT-solver that is directly aimed to solve the sum-of-costs objective function. Using both a lower bound on the sum-of-costs and an upper bound on the makespan, we are able to have a reasonable number of variables in our SAT encoding. We then further improve the encoding by borrowing ideas from ICTS, a search-based solver. Experimental evaluation on several domains show that there are many scenarios where our new SAT-based methods outperforms the best variants of previous sum-of-costs search solvers - the ICTS, CBS algorithms, and ICBS algorithms.
On Improving Decentralized Hysteretic Deep Reinforcement Learning
Lu, Xueguang, Amato, Christopher
Recent successes of value-based multi-agent deep reinforcement learning employ optimism by limiting underestimation updates of value function estimator, through carefully controlled learning rate (Omidshafiei et al., 2017) or reduced update probability (Palmer et al., 2018). To achieve full cooperation when learning independently, an agent must estimate the state values contingent on having optimal teammates; therefore, value overestimation is frequency injected to counteract negative effects caused by unobservable teammate sub-optimal policies and explorations. Aiming to solve this issue through automatic scheduling, this paper introduces a decentralized quantile estimator, which we found empirically to be more stable, sample efficient and more likely to converge to the joint optimal policy.
Nonparametric inference of interaction laws in systems of agents from trajectory data
Lu, Fei, Maggioni, Mauro, Tang, Sui, Zhong, Ming
Inferring the laws of interaction between particles and agents in complex dynamical systems from observational data is a fundamental challenge in a wide variety of disciplines. We propose a non-parametric statistical learning approach to estimate the governing laws of distance-based interactions, with no reference or assumption about their analytical form, from data consisting trajectories of interacting agents. We demonstrate the effectiveness of our learning approach both by providing theoretical guarantees, and by testing the approach on a variety of prototypical systems in various disciplines. These systems include homogeneous and heterogeneous agents systems, ranging from particle systems in fundamental physics to agent-based systems modeling opinion dynamics under the social influence, prey-predator dynamics, flocking and swarming, and phototaxis in cell dynamics.
Vision-based Navigation with Language-based Assistance via Imitation Learning with Indirect Intervention
Nguyen, Khanh, Dey, Debadeepta, Brockett, Chris, Dolan, Bill
We present Vision-based Navigation with Language-based Assistance (VNLA), a grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments. The task emulates a real-world scenario in that (a) the requester may not know how to navigate to the target objects and thus makes requests by only specifying high-level endgoals, and (b) the agent is capable of sensing when it is lost and querying an advisor, who is more qualified at the task, to obtain language subgoals to make progress. To model language-based assistance, we develop a general framework termed Imitation Learning with Indirect Intervention (I3L), and propose a solution that is effective on the VNLA task. Empirical results show that this approach significantly improves the success rate of the learning agent over other baselines on both seen and unseen environments.
Learning Sharing Behaviors with Arbitrary Numbers of Agents
Metcalf, Katherine, Theobald, Barry-John, Apostoloff, Nicholas
We propose a method for modeling and learning turn-taking behaviors for accessing a shared resource. We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents. The individual behavior models are weighted finite state transducers (WFSTs) with weights dynamically updated during interactions, and the multi-agent fusion model is a logistic regression classifier. We test our models in a multi-agent tower-building environment, where a Q-learning agent learns to interact with rule-based agents. Our approach accurately models the underlying behavior patterns of the rule-based agents with accuracy ranging between 0.63 and 1.0 depending on the stochasticity of the other agent behaviors. In addition we show using KL-divergence that the model accurately captures the distribution of next actions when interacting with both a single agent (KL-divergence < 0.1) and with multiple agents (KL-divergence < 0.37). Finally, we demonstrate that our behavior model can be used by a Q-learning agent to take turns in an interactive turn-taking environment.
Hierarchical Fuzzy Opinion Networks: Top-Down for Social Organizations and Bottom-Up for Election
A fuzzy opinion is a Gaussian fuzzy set with the center representing the opinion and the standard deviation representing the uncertainty about the opinion, and a fuzzy opinion network is a connection of a number of fuzzy opinions in a structured way. In this paper, we propose: (a) a top-down hierarchical fuzzy opinion network to model how the opinion of a top leader is penetrated into the members in social organizations, and (b) a bottom-up fuzzy opinion network to model how the opinions of a large number of agents are agglomerated layer-by-layer into a consensus or a few opinions in the social processes such as an election. For the top-down hierarchical fuzzy opinion network, we prove that the opinions of all the agents converge to the leaders opinion, but the uncertainties of the agents in different groups are generally converging to different values. We demonstrate that the speed of convergence is greatly improved by organizing the agents in a hierarchical structure of small groups. For the bottom-up hierarchical fuzzy opinion network, we simulate how a wide spectrum of opinions are negotiating and summarizing with each other in a layer-by-layer fashion in some typical situations.