AITopics

2010.05042

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (0.34)
Health & Medicine > Therapeutic Area > Immunology (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

arXiv.org Artificial IntelligenceOct-11-2020

A DRL-based Multiagent Cooperative Control Framework for CAV Networks: a Graphic Convolution Q Network

Dong, Jiqian, Chen, Sikai, Ha, Paul Young Joun, Li, Yujie, Labi, Samuel

Connected Autonomous Vehicle (CAV) Network can be defined as a collection of CAVs operating at different locations on a multilane corridor, which provides a platform to facilitate the dissemination of operational information as well as control instructions. Cooperation is crucial in CAV operating systems since it can greatly enhance operation in terms of safety and mobility, and high-level cooperation between CAVs can be expected by jointly plan and control within CAV network. However, due to the highly dynamic and combinatory nature such as dynamic number of agents (CAVs) and exponentially growing joint action space in a multiagent driving task, achieving cooperative control is NP hard and cannot be governed by any simple rule-based methods. In addition, existing literature contains abundant information on autonomous driving's sensing technology and control logic but relatively little guidance on how to fuse the information acquired from collaborative sensing and build decision processor on top of fused information. In this paper, a novel Deep Reinforcement Learning (DRL) based approach combining Graphic Convolution Neural Network (GCN) and Deep Q Network (DQN), namely Graphic Convolution Q network (GCQ) is proposed as the information fusion module and decision processor. The proposed model can aggregate the information acquired from collaborative sensing and output safe and cooperative lane changing decisions for multiple CAVs so that individual intention can be satisfied even under a highly dynamic and partially observed mixed traffic. The proposed algorithm can be deployed on centralized control infrastructures such as road-side units (RSU) or cloud platforms to improve the CAV operation.

artificial intelligence, information, machine learning, (17 more...)

2010.05437

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > District of Columbia > Washington (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.05)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.05)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)

arXiv.org Machine LearningOct-7-2020

Learning Theory for Inferring Interaction Kernels in Second-Order Interacting Agent Systems

Miller, Jason, Tang, Sui, Zhong, Ming, Maggioni, Mauro

Modeling the complex interactions of systems of particles or agents is a fundamental scientific and mathematical problem that is studied in diverse fields, ranging from physics and biology, to economics and machine learning. In this work, we describe a very general second-order, heterogeneous, multivariable, interacting agent model, with an environment, that encompasses a wide variety of known systems. We describe an inference framework that uses nonparametric regression and approximation theory based techniques to efficiently derive estimators of the interaction kernels which drive these dynamical systems. We develop a complete learning theory which establishes strong consistency and optimal nonparametric min-max rates of convergence for the estimators, as well as provably accurate predicted trajectories. The estimators exploit the structure of the equations in order to overcome the curse of dimensionality and we describe a fundamental coercivity condition on the inverse problem which ensures that the kernels can be learned and relates to the minimal singular value of the learning matrix. The numerical algorithm presented to build the estimators is parallelizable, performs well on high-dimensional problems, and is demonstrated on complex dynamical systems.

artificial intelligence, kk null, machine learning, (17 more...)

2010.03729

Country:

North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.84)

arXiv.org Artificial IntelligenceOct-6-2020

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Gupta, Tarun, Mahajan, Anuj, Peng, Bei, Böhmer, Wendelin, Whiteson, Shimon

This paper focuses on cooperative value-based multi-agent reinforcement learning (MARL) in the paradigm of centralized training with decentralized execution (CTDE). Current state-of-the-art value-based MARL methods leverage CTDE to learn a centralized joint-action value function as a monotonic mixing of each agent's utility function, which enables easy decentralization. However, this monotonic restriction leads to inefficient exploration in tasks with nonmonotonic returns due to suboptimal approximations of the values of joint actions. To address this, we present a novel MARL approach called Universal Value Exploration (UneVEn), which uses universal successor features (USFs) to learn policies of tasks related to the target task, but with simpler reward functions in a sample efficient manner. UneVEn uses novel action-selection schemes between randomly sampled related tasks during exploration, which enables the monotonic joint-action value function of the target task to place more importance on useful joint actions. Empirical results on a challenging cooperative predator-prey task requiring significant coordination amongst agents show that UneVEn significantly outperforms state-of-the-art baselines.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2010.02974

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kayaalp, Mert, Vlaski, Stefan, Sayed, Ali H.

Dif-MAML: Decentralized Multi-Agent Meta-Learning

arXiv.org Artificial IntelligenceOct-6-2020

The objective of meta-learning is to exploit the knowledge obtained from observed tasks to improve adaptation to unseen tasks. As such, meta-learners are able to generalize better when they are trained with a larger number of observed tasks and with a larger amount of data per task. Given the amount of resources that are needed, it is generally difficult to expect the tasks, their respective data, and the necessary computational capacity to be available at a single central location. It is more natural to encounter situations where these resources are spread across several agents connected by some graph topology. The formalism of meta-learning is actually well-suited to this decentralized setting, where the learner would be able to benefit from information and computational power spread across the agents. Motivated by this observation, in this work, we propose a cooperative fully-decentralized multi-agent meta-learning algorithm, referred to as Diffusion-based MAML or Dif-MAML. Decentralized optimization algorithms are superior to centralized implementations in terms of scalability, avoidance of communication bottlenecks, and privacy guarantees. The work provides a detailed theoretical analysis to show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML objective even in non-convex environments. Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.

artificial intelligence, launch model, machine learning, (15 more...)

doi: 10.1109/OJSP.2021.3140000

2010.0287

Country:

North America > United States > California > Los Angeles County > Santa Monica (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(9 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Oren, Nir, Meneguzzi, Felipe

Norm Identification through Plan Recognition

arXiv.org Artificial IntelligenceOct-6-2020

Societal rules, as exemplified by norms, aim to provide a degree of behavioural stability to multi-agent societies. Norms regulate a society using the deontic concepts of permissions, obligations and prohibitions to specify what can, must and must not occur in a society. Many implementations of normative systems assume various combinations of the following assumptions: that the set of norms is static and defined at design time; that agents joining a society are instantly informed of the complete set of norms; that the set of agents within a society does not change; and that all agents are aware of the existing norms. When any one of these assumptions is dropped, agents need a mechanism to identify the set of norms currently present within a society, or risk unwittingly violating the norms. In this paper, we develop a norm identification mechanism that uses a combination of parsing-based plan recognition and Hierarchical Task Network (HTN) planning mechanisms, which operates by analysing the actions performed by other agents. While our basic mechanism cannot learn in situations where norm violations take place, we describe an extension which is able to operate in the presence of violations.

artificial intelligence, plan recognition, planning & scheduling, (15 more...)

2010.02627

Country:

South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > Scotland (0.04)

Genre: Research Report (0.40)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling > Plan Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

Suri, Karush, Shi, Xiao Qi, Plataniotis, Konstantinos, Lawryshyn, Yuri

Energy-based Surprise Minimization for Multi-Agent Value Factorization

arXiv.org Machine LearningOct-5-2020

Multi-Agent Reinforcement Learning (MARL) has demonstrated significant success in training decentralised policies in a centralised manner by making use of value factorization methods. However, addressing surprise across spurious states and approximation bias remain open problems for multi-agent settings. We introduce the Energy-based MIXer (EMIX), an algorithm which minimizes surprise utilizing the energy across agents. Our contributions are threefold; (1) EMIX introduces a novel surprise minimization technique across multiple agents in the case of multi-agent partially-observable settings. (2) EMIX highlights the first practical use of energy functions in MARL (to our knowledge) with theoretical guarantees and experiment validations of the energy operator. Lastly, (3) EMIX presents a novel technique for addressing overestimation bias across agents in MARL. When evaluated on a range of challenging StarCraft II micromanagement scenarios, EMIX demonstrates consistent state-of-the-art performance for multi-agent surprise minimization. Moreover, our ablation study highlights the necessity of the energy-based scheme and the need for elimination of overestimation bias in MARL. Our implementation of EMIX and videos of agents are available at https://karush17.github.io/emix-web/.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2009.09842

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningOct-5-2020

QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning

Son, Kyunghwan, Ahn, Sungsoo, Reyes, Roben Delos, Shin, Jinwoo, Yi, Yung

QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of learning the largest class of joint-action value functions up to date. However, despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments, such as Starcraft Multi-Agent Challenge (SMAC). In this paper, we identify the performance bottleneck of QTRAN and propose a substantially improved version, coined QTRAN++. Our gains come from (i) stabilizing the training objective of QTRAN, (ii) removing the strict role separation between the action-value estimators of QTRAN, and (iii) introducing a multi-head mixing network for value transformation. Through extensive evaluation, we confirm that our diagnosis is correct, and QTRAN++ successfully bridges the gap between empirical performance and theoretical guarantee. In particular, QTRAN++ newly achieves state-of-the-art performance in the SMAC environment. The code will be released.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2006.1201

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningOct-4-2020

RODE: Learning Roles to Decompose Multi-Agent Tasks

Wang, Tonghan, Gupta, Tarun, Mahajan, Anuj, Peng, Bei, Whiteson, Shimon, Zhang, Chongjie

Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles. However, it is largely unclear how to efficiently discover such a set of roles. To solve this problem, we propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents. Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces. We further integrate information about action effects into the role policies to boost learning efficiency and policy generalization. By virtue of these advances, our method (1) outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark and (2) achieves rapid transfer to new environments with three times the number of agents. Demonstrative videos are available at https://sites.google.com/view/rode-marl .

action space, artificial intelligence, machine learning, (12 more...)

2010.01523

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

arXiv.org Machine LearningOct-4-2020

Off-Policy Multi-Agent Decomposed Policy Gradients

Wang, Yihan, Han, Beining, Wang, Tonghan, Dong, Heng, Zhang, Chongjie

Multi-agent policy gradient (MAPG) methods recently witness vigorous progress. However, there is a significant performance discrepancy between MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment in both discrete and continuous action spaces. We formally show that DOP critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that DOP significantly outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms. Demonstrative videos are available at https://sites.google.com/view/dop-mapg/.

artificial intelligence, international conference, machine learning, (16 more...)

2007.12322

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)