AITopics | Agent Societies

Collaborating Authors

Agent Societies

News Overviews Instructional Materials AI-Alerts Classics

Escaping the State of Nature: A Hobbesian Approach to Cooperation in Multi-agent Reinforcement Learning

arXiv.org Artificial IntelligenceJun-5-2019

Cooperation is a phenomenon that has been widely studied across many different disciplines. In the field of computer science, the modularity and robustness of multi-agent systems offer significant practical advantages over individual machines. At the same time, agents using standard reinforcement learning algorithms often fail to achieve long-term, cooperative strategies in unstable environments when there are short-term incentives to defect. Political philosophy, on the other hand, studies the evolution of cooperation in humans who face similar incentives to act individualistically, but nevertheless succeed in forming societies. Thomas Hobbes in Leviathan provides the classic analysis of the transition from a pre-social State of Nature, where consistent defection results in a constant state of war, to stable political community through the institution of an absolute Sovereign. This thesis argues that Hobbes's natural and moral philosophy are strikingly applicable to artificially intelligent agents and aims to show that his political solutions are experimentally successful in producing cooperation among modified Q-Learning agents. Cooperative play is achieved in a novel Sequential Social Dilemma called the Civilization Game, which models the State of Nature by introducing the Hobbesian mechanisms of opponent learning awareness and majoritarian voting, leading to the establishment of a Sovereign.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1906.09874

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (1.00)
Law (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Add feedback

Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning

Böhmer, Wendelin, Rashid, Tabish, Whiteson, Shimon

arXiv.org Artificial IntelligenceJun-5-2019

This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning. We discuss the challenges in applying intrinsic reward to multiple collaborative agents and demonstrate how unreliable reward can prevent decentralized agents from learning the optimal policy. We address this problem with a novel framework, Independent Centrally-assisted Q-learning (ICQL), in which decentralized agents share control and an experience replay buffer with a centralized agent. Only the centralized agent is intrinsically rewarded, but the decentralized agents still benefit from improved exploration, without the distraction of unreliable incentives.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1906.02138

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.91)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)

Add feedback

Learning Transferable Cooperative Behavior in Multi-Agent Teams

Agarwal, Akshat, Kumar, Sumit, Sycara, Katia

arXiv.org Machine LearningJun-4-2019

While multi-agent interactions can be naturally modeled as a graph, the environment has traditionally been considered as a black box. We propose to create a shared agent-entity graph, where agents and environmental entities form vertices, and edges exist between the vertices which can communicate with each other. Agents learn to cooperate by exchanging messages along the edges of this graph. Our proposed multi-agent reinforcement learning framework is invariant to the number of agents or entities present in the system as well as permutation invariance, both of which are desirable properties for any multi-agent system representation. We present state-of-the-art results on coverage, formation and line control tasks for multi-agent teams in a fully decentralized framework and further show that the learned policies quickly transfer to scenarios with different team sizes along with strong zero-shot generalization performance. This is an important step towards developing multi-agent teams which can be realistically deployed in the real world without assuming complete prior knowledge or instantaneous communication at unbounded distances.

agent, artificial intelligence, communication, (14 more...)

arXiv.org Machine Learning

1906.01202

Country: North America > United States (0.05)

Genre: Research Report (0.50)

Industry: Transportation (0.35)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Add feedback

Attentional Policies for Cross-Context Multi-Agent Reinforcement Learning

Wright, Matthew A., Horowitz, Roberto

arXiv.org Machine LearningMay-31-2019

Many potential applications of reinforcement learning in the real world involve interacting with other agents whose numbers vary over time. We propose new neural policy architectures for these multi-agent problems. In contrast to other methods of training an individual, discrete policy for each agent and then enforcing cooperation through some additional inter-policy mechanism, we follow the spirit of recent work on the power of relational inductive biases in deep networks by learning multi-agent relationships at the policy level via an attentional architecture. In our method, all agents share the same policy, but independently apply it in their own context to aggregate the other agents' state information when selecting their next action. The structure of our architectures allow them to be applied on environments with varying numbers of agents. We demonstrate our architecture on a benchmark multi-agent autonomous vehicle coordination problem, obtaining superior results to a full-knowledge, fully-centralized reference solution, and significantly outperforming it when scaling to large numbers of agents.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1905.13428

Country:

Europe (0.93)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.87)

Add feedback

Convergence Analysis of Gradient-Based Learning with Non-Uniform Learning Rates in Non-Cooperative Multi-Agent Settings

Chasnov, Benjamin, Ratliff, Lillian J., Mazumdar, Eric, Burden, Samuel A.

arXiv.org Artificial IntelligenceMay-30-2019

Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium. In particular, we consider continuous games where agents learn in (i) deterministic settings with oracle access to their gradient and (ii) stochastic settings with an unbiased estimator of their gradient. Utilizing the minimum and maximum singular values of the game Jacobian, we provide finite-time convergence guarantees in the deterministic case. On the other hand, in the stochastic case, we provide concentration bounds guaranteeing that with high probability agents will converge to a neighborhood of a stable local Nash equilibrium in finite time. Different than other works in this vein, we also study the effects of non-uniform learning rates on the learning dynamics and convergence rates. We find that much like preconditioning in optimization, non-uniform learning rates cause a distortion in the vector field which can, in turn, change the rate of convergence and the shape of the region of attraction. The analysis is supported by numerical examples that illustrate different aspects of the theory. We conclude with discussion of the results and open questions.

agent, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1906.00731

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reinforcement Learning for Mean Field Game

Tiwari, Nilay, Ghosh, Arnob, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceMay-30-2019

Stochastic games provide a framework for interactions among multi-agents and enable a myriad of applications. In these games, agents decide on actions simultaneously, the state of an agent moves to the next state, and each agent receives a reward. However, finding an equilibrium (if exists) in this game is often difficult when the number of agents become large. This paper focuses on finding a mean-field equilibrium (MFE) in an action coupled stochastic game setting in an episodic framework. It is assumed that the impact of the other agents' can be assumed by the empirical distribution of the mean of the actions. All agents know the action distribution and employ lower-myopic best response dynamics to choose the optimal oblivious strategy. This paper proposes a posterior sampling based approach for reinforcement learning in the mean-field game, where each agent samples a transition probability from the previous transitions. We show that the policy and action distributions converge to the optimal oblivious strategy and the limiting distribution, respectively, which constitute a MFE.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1905.13357

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Add feedback

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

Menda, Kunal, Chen, Yi-Chun, Grana, Justin, Bono, James W., Tracey, Brendan D., Kochenderfer, Mykel J., Wolpert, David

arXiv.org Artificial IntelligenceMay-29-2019

The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios in which the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely separated events occur, adversely affecting the policies learned. In addition, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TITS.2018.2848264

1709.06656

Country: North America > United States > California (0.46)

Genre: Research Report (0.82)

Industry:

Transportation (0.47)
Government (0.46)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback

Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning

Iqbal, Shariq, Sha, Fei

arXiv.org Artificial IntelligenceMay-28-2019

Sparse rewards are one of the most important challenges in reinforcement learning. In the single-agent setting, these challenges have been addressed by introducing intrinsic rewards that motivate agents to explore unseen regions of their state spaces. Applying these techniques naively to the multi-agent setting results in individual agents exploring independently, without any coordination among themselves. We argue that learning in cooperative multi-agent settings can be accelerated and improved if agents coordinate with respect to what they have explored. In this paper we propose an approach for learning how to dynamically select between different types of intrinsic rewards which consider not just what an individual agent has explored, but all agents, such that the agents can coordinate their exploration and maximize extrinsic returns. Concretely, we formulate the approach as a hierarchical policy where a high-level controller selects among sets of policies trained on different types of intrinsic rewards and the low-level controllers learn the action policies of all agents under these specific rewards. We demonstrate the effectiveness of the proposed approach in a multi-agent learning domain with sparse rewards.

agent, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

1905.12127

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Add feedback

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Iqbal, Shariq, Sha, Fei

arXiv.org Artificial IntelligenceMay-27-2019

Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep. This attention mechanism enables more effective and scalable learning in complex multi-agent environments, when compared to recent approaches. Our approach is applicable not only to cooperative settings with shared rewards, but also individualized reward settings, including adversarial settings, as well as settings that do not provide global states, and it makes no assumptions about the action spaces of the agents. As such, it is flexible enough to be applied to most multi-agent learning problems.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1810.02912

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Hybrid Algorithm for Metaheuristic Optimization

Khanna, Sujit Pramod, Ororbia, Alexander II

arXiv.org Artificial IntelligenceMay-26-2019

We propose a novel, flexible algorithm for combining together metaheuristic optimizers for non-convex optimization problems. Our approach treats the constituent optimizers as a team of complex agents that communicate information amongst each other at various intervals during the simulation process. The information produced by each individual agent can be combined in various ways via higher-level operators. In our experiments on key benchmark functions, we investigate how the performance of our algorithm varies with respect to several of its key modifiable properties. Finally, we apply our proposed algorithm to classification problems involving the optimization of support-vector machine classifiers.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1906.0201

Country: North America > United States (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback