AITopics

2002.05706

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.81)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.65)

Jiang, Albert, Marcolino, Leandro Soriano, Procaccia, Ariel D., Sandholm, Tuomas, Shah, Nisarg, Tambe, Milind

Diverse Randomized Agents Vote to Win

Neural Information Processing SystemsFeb-14-2020, 10:15:18 GMT

We investigate the power of voting among diverse, randomized software agents. With teams of computer Go agents in mind, we develop a novel theoretical model of two-stage noisy voting that builds on recent work in machine learning. This model allows us to reason about a collection of agents with different biases (determined by the first-stage noise models), which, furthermore, apply randomized algorithms to evaluate alternatives and produce votes (captured by the second-stage noise models). We analytically demonstrate that a uniform team, consisting of multiple instances of any single agent, must make a significant number of mistakes, whereas a diverse team converges to perfection as the number of agents grows. Our experiments, which pit teams of computer Go agents against strong agents, provide evidence for the effectiveness of voting when agents are diverse.

agent, diverse randomized agent vote, voting, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Games > Go (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

León, Borja G., Belardinelli, Francesco

Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-14-2020

This paper focus on formally extending Markov Learning (RL) has recently attracted interest as a way for singleagent Games (MGs), the mathematical model that is traditionally used in RL to learn multiple-task specifications. In this paper we extend MARL, to build a new general model, i.e, not focused solely in one this convergence to multi-agent settings and formally define Extended kind of multi-agent game, that allows multiple learning agents to Markov Games as a general mathematical model that allows concurrently fulfill various non-Markovian specifications in multiagent multiple RL agents to concurrently learn various non-Markovian settings. To support our model with empirical evidence, we specifications. To introduce this new model we provide formal definitions also extended two logic-based RL algorithms to multi-agents systems and proofs as well as empirical tests of RL algorithms running in order to show how various learning agents can fulfill different on this framework. Specifically, we use our model to train two different types of non-Markovian specifications expressed in co-safe- Lineartime logic-based multi-agent RL algorithms to solve diverse settings Temporal Logic (LT L). Our results are promising and point to of non-Markovian co-safe LT L specifications.

agent, algorithm, specification, (12 more...)

2002.06

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

arXiv.org Machine LearningFeb-11-2020

Learning Structured Communication for Multi-agent Reinforcement Learning

Sheng, Junjie, Wang, Xiangfeng, Jin, Bo, Yan, Junchi, Li, Wenhao, Chang, Tsung-Hui, Wang, Jun, Zha, Hongyuan

This work explores the large-scale multi-agent communication mechanism under a multi-agent reinforcement learning (MARL) setting. We summarize the general categories of topology for communication structures in MARL literature, which are often manually specified. Then we propose a novel framework termed as Learning Structured Communication (LSC) by using a more flexible and efficient communication topology. Our framework allows for adaptive agent grouping to form different hierarchical formations over episodes, which is generated by an auxiliary task combined with a hierarchical routing protocol. Given each formed topology, a hierarchical graph neural network is learned to enable effective message information generation and propagation among inter- and intra-group communications. In contrast to existing communication mechanisms, our method has an explicit while learnable design for hierarchical communication. Experiments on challenging tasks show the proposed LSC enjoys high communication efficiency, scalability, and global cooperation capability.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2002.04235

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Industry: Telecommunications (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

arXiv.org Machine LearningFeb-10-2020

Q-Learning for Mean-Field Controls

Gu, Haotian, Guo, Xin, Wei, Xiaoli, Xu, Renyuan

Multi-agent reinforcement learning (MARL) has been applied to many challenging problems including two-team computer games, autonomous drivings, and real-time biddings. Despite the empirical success, there is a conspicuous absence of theoretical study of different MARL algorithms: this is mainly due to the curse of dimensionality caused by the exponential growth of the joint state-action space as the number of agents increases. Mean-field controls (MFC) with infinitely many agents and deterministic flows, meanwhile, provide good approximations to $N$-agent collaborative games in terms of both game values and optimal strategies. In this paper, we study the collaborative MARL under an MFC approximation framework: we develop a model-free kernel-based Q-learning algorithm (CDD-Q) and show that its convergence rate and sample complexity are independent of the number of agents. Our empirical studies on MFC examples demonstrate strong performances of CDD-Q. Moreover, the CDD-Q algorithm can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space.

algorithm, assumption 3, theorem 3, (13 more...)

2002.04131

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Information Technology (0.87)
Transportation > Ground > Road (0.87)
Leisure & Entertainment > Games (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

arXiv.org Artificial IntelligenceFeb-6-2020

Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach

Xue, Zeyue, Luo, Shuang, Wu, Chao, Zhou, Pan, Bian, Kaigui, Du, Wei

Peer-to-peer knowledge transfer in distributed environments has emerged as a promising method since it could accelerate learning and improve team-wide performance without relying on pre-trained teachers in deep reinforcement learning. However, for traditional peer-to-peer methods such as action advising, they have encountered difficulties in how to efficiently expressed knowledge and advice. As a result, we propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation. But it is still challenging to transfer Q-function directly since it is unstable and not bounded. To address this issue confronted with existing works, we adopt Categorical Deep Q-Network. We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge among multiple distributed agents. Our proposed framework, namely Learning and Teaching Categorical Reinforcement (LTCR), shows promising performance on stabilizing and accelerating learning progress with improved team-wide reward in four typical experimental environments.

agent, distillation, model distillation, (14 more...)

2002.02202

Country: North America > United States > Arkansas > Washington County > Fayetteville (0.04)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

arXiv.org Artificial IntelligenceJan-31-2020

Neural MMO v1.3: A Massively Multiagent Game Environment for Training and Evaluating Neural Networks

Suarez, Joseph, Du, Yilun, Mordach, Igor, Isola, Phillip

Progress in multiagent intelligence research is fundamentally limited by the number and quality of environments available for study. In recent years, simulated games have become a dominant research platform within reinforcement learning, in part due to their accessibility and interpretability. Previous works have targeted and demonstrated success on arcade, first person shooter (FPS), real-time strategy (RTS), and massive online battle arena (MOBA) games. Our work considers massively multiplayer online role-playing games (MMORPGs or MMOs), which capture several complexities of real-world learning that are not well modeled by any other game genre. We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO. We further demonstrate that standard policy gradient methods and simple baseline models can learn interesting emergent exploration and specialization behaviors in this setting.

agent, neural mmo, reinforcement, (14 more...)

2001.12004

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Asghari, Seyed Mohammad, Ouyang, Yi, Nayyar, Ashutosh

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

arXiv.org Machine LearningJan-27-2020

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(\sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $\tilde{O}(\sqrt{T})$ regret bound. (Here $\tilde{O}(\cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.

algorithm, survey article, upstream oil & gas, (18 more...)

2001.10122

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment (0.67)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.65)

RobohubJan-26-2020, 23:59:33 GMT

Emergent behavior by minimizing chaos

All living organisms carve out environmental niches within which they can maintain relative predictability amidst the ever-increasing entropy around them (1), (2). Humans, for example, go to great lengths to shield themselves from surprise -- we band together in millions to build cities with homes, supplying water, food, gas, and electricity to control the deterioration of our bodies and living spaces amidst heat and cold, wind and storm. The need to discover and maintain such surprise-free equilibria has driven great resourcefulness and skill in organisms across very diverse natural habitats. Motivated by this, we ask: could the motive of preserving order amidst chaos guide the automatic acquisition of useful behaviors in artificial agents? This central problem in artificial intelligence has evoked several candidate solutions, largely focusing on novelty-seeking behaviors (3), (4), (5).

agent, entropy, novelty, (13 more...)

Robohub

Industry: Leisure & Entertainment > Games (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.42)

Kumar, Rajiv Ranjan, Varakantham, Pradeep

On Solving Cooperative MARL Problems with a Few Good Experiences

arXiv.org Artificial IntelligenceJan-22-2020

Cooperative Multi-agent Reinforcement Learning (MARL) is crucial for cooperative decentralized decision learning in many domains such as search and rescue, drone surveillance, package delivery and fire fighting problems. In these domains, a key challenge is learning with a few good experiences, i.e., positive reinforcements are obtained only in a few situations (e.g., on extinguishing a fire or tracking a crime or delivering a package) and in most other situations there is zero or negative reinforcement. Learning decisions with a few good experiences is extremely challenging in cooperative MARL problems due to three reasons. First, compared to the single agent case, exploration is harder as multiple agents have to be coordinated to receive a good experience. Second, environment is not stationary as all the agents are learning at the same time (and hence change policies). Third, scale of problem increases significantly with every additional agent. Relevant existing work is extensive and has focussed on dealing with a few good experiences in single-agent RL problems or on scalable approaches for handling non-stationarity in MARL problems. Unfortunately, neither of these approaches (or their extensions) are able to address the problem of sparse good experiences effectively. Therefore, we provide a novel fictitious self imitation approach that is able to simultaneously handle non-stationarity and sparse good experiences in a scalable manner. Finally, we provide a thorough comparison (experimental or descriptive) against relevant cooperative MARL algorithms to demonstrate the utility of our approach.

agent, good experience, nfsip, (12 more...)

2001.07993

Country: Asia > Singapore (0.04)

Genre: Research Report (0.50)

Industry: Law Enforcement & Public Safety > Fire & Emergency Services (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)