AITopics | Agents

Collaborating Authors

Agents

News Overviews Instructional Materials AI-Alerts Classics

A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

Carion, Nicolas, Usunier, Nicolas, Synnaeve, Gabriel, Lazaric, Alessandro

Neural Information Processing SystemsMar-18-2020, 23:48:18 GMT

Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited. By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks. At each step, the assignment is obtained by solving a centralized optimization problem (the inference procedure) whose objective function is parameterized by a learned scoring model.

cooperative multi-agent reinforcement learning, generalization, structured prediction approach, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.40)

Add feedback

MAVEN: Multi-Agent Variational Exploration

Mahajan, Anuj, Rashid, Tabish, Samvelyan, Mikayel, Whiteson, Shimon

Neural Information Processing SystemsMar-18-2020, 23:33:38 GMT

Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments. We specifically focus on QMIX, the current state-of-the-art in this domain. We show that the representation constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control. The value-based agents condition their behaviour on the shared latent variable controlled by a hierarchical policy.

constraint, maven, multi-agent variational exploration

Neural Information Processing Systems

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Unsupervised Emergence of Egocentric Spatial Structure from Sensorimotor Prediction

Laflaquière, Alban, Ortiz, Michael Garcia

Neural Information Processing SystemsMar-18-2020, 23:31:12 GMT

Despite its omnipresence in robotics application, the nature of spatial knowledge and the mechanisms that underlie its emergence in autonomous agents are still poorly understood. Recent theoretical works suggest that the Euclidean structure of space induces invariants in an agent's raw sensorimotor experience. We hypothesize that capturing these invariants is beneficial for sensorimotor prediction and that, under certain exploratory conditions, a motor representation capturing the structure of the external space should emerge as a byproduct of learning to predict future sensory experiences. We propose a simple sensorimotor predictive scheme, apply it to different agents and types of exploration, and evaluate the pertinence of these hypotheses. We show that a naive agent can capture the topology and metric regularity of its sensor's position in an egocentric spatial frame without any a priori knowledge, nor extraneous supervision.

egocentric spatial structure, sensorimotor prediction, unsupervised emergence, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)

Add feedback

No-Press Diplomacy: Modeling Multi-Agent Gameplay

Paquette, Philip, Lu, Yuchen, BOCCO, SETON STEVEN, Smith, Max, O.-G., Satya, Kummerfeld, Jonathan K., Pineau, Joelle, Singh, Satinder, Courville, Aaron C.

Neural Information Processing SystemsMar-18-2020, 22:16:47 GMT

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play.

diplomacy, modeling multi-agent gameplay, no-press diplomacy

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

Du, Yali, Han, Lei, Fang, Meng, Liu, Ji, Dai, Tianhong, Tao, Dacheng

Neural Information Processing SystemsMar-18-2020, 22:16:25 GMT

A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL.

agent, learning individual intrinsic reward, multi-agent reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

Bar-On, Yogev, Mansour, Yishay

Neural Information Processing SystemsMar-18-2020, 21:46:29 GMT

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for each agent $v$ an individual expected regret of $\widetilde{O}\left(\sqrt{\left(1 \frac{K}{\left \mathcal{N}\left(v\right)\right }\right)T}\right)$, where $T$ is the number of time steps, $K$ is the number of actions and $\mathcal{N}\left(v\right)$ is the set of neighbors of agent $v$ in the communication graph. We present algorithms both for the case that the communication graph is known to all the agents, and for the case that the graph is unknown. When the graph is unknown, each agent knows only the set of its neighbors and an upper bound on the total number of agents. The individual regret between the models differs only by a logarithmic factor.

agent, cooperative nonstochastic multi-armed bandit, individual regret, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)

Add feedback

A Crash Course in Game Theory for Machine Learning: Classic and New Ideas - KDnuggets

#artificialintelligenceMar-18-2020, 21:06:12 GMT

Game theory is one of the most fascinating areas of mathematics that have influenced diverse fields such as economics, social sciences, biology and, obviously, computer science. Games are playing a key role in the evolution of artificial intelligence(AI). For starters, game environments are becoming a popular training mechanism in areas such as reinforcement learning or imitation learning. In theory, any multi-agent AI system can be subjected to gamified interactions between its participants. The branch of mathematics that formulates the principles of games is known as game theory.

ai system, game theory, participant, (14 more...)

#artificialintelligence

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)

Add feedback

Incremental Scene Synthesis

Planche, Benjamin, Rong, Xuejian, Wu, Ziyan, Karanam, Srikrishna, Kosch, Harald, Tian, YingLi, Ernst, Jan, HUTTER, ANDREAS

Neural Information Processing SystemsMar-18-2020, 21:02:42 GMT

We present a method to incrementally generate complete 2D or 3D scenes with the following properties: (a) it is globally consistent at each step according to a learned scene prior, (b) real observations of a scene can be incorporated while observing global consistency, (c) unobserved regions can be hallucinated locally in consistence with previous observations, hallucinations and global priors, and (d) hallucinations are statistical in nature, i.e., different scenes can be generated from the same observations. To achieve this, we model the virtual scene, where an active agent at each step can either perceive an observed part of the scene or generate a local hallucination. The latter can be interpreted as the agent's expectation at this step through the scene and can be applied to autonomous navigation. It can otherwise sample entirely imagined scenes from prior distributions. Besides autonomous agents, applications include problems where large data is required for building robust real-world applications, but few samples are available. We demonstrate efficacy on various 2D as well as 3D data.

agent, hallucination, incremental scene synthesis, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.63)

Add feedback

Finding Friend and Foe in Multi-Agent Games

Serrino, Jack, Kleiman-Weiner, Max, Parkes, David C., Tenenbaum, Josh

Neural Information Processing SystemsMar-18-2020, 20:47:42 GMT

Recent breakthroughs in AI for multi-agent games like Go, Poker, and Dota, have seen great strides in recent years. Yet none of these games address the real-life challenge of cooperation in the presence of unknown and uncertain teammates. This challenge is a key game mechanism in hidden role games. Here we develop the DeepRole algorithm, a multi-agent reinforcement learning agent that we test on "The Resistance: Avalon", the most popular hidden role game. DeepRole combines counterfactual regret minimization (CFR) with deep value networks trained through self-play.

deeprole outperform, friend and foe, multi-agent game, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

Qu, Chao, Mannor, Shie, Xu, Huan, Qi, Yuan, Song, Le, Xiong, Junwu

Neural Information Processing SystemsMar-18-2020, 20:47:21 GMT

We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume each agent is located at a node of a communication network and can exchange information only with its neighbors. Using softmax temporal consistency, we derive a primal-dual decentralized optimization method and obtain a principled and data-efficient iterative algorithm named {\em value propagation}. We prove a non-asymptotic convergence rate of $\mathcal{O}(1/T)$ with nonlinear function approximation.

function approximation, networked deep multi-agent reinforcement learning, value propagation, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback