AITopics | Agent Societies

Collaborating Authors

Agent Societies

News Overviews Instructional Materials AI-Alerts Classics

Federated Reinforcement Learning: Techniques, Applications, and Open Challenges

Qi, Jiaju, Zhou, Qihao, Lei, Lei, Zheng, Kan

arXiv.org Artificial IntelligenceAug-26-2021

This paper presents a comprehensive survey of Federated Reinforcement Learning (FRL), an emerging and promising field in Reinforcement Learning (RL). Starting with a tutorial of Federated Learning (FL) and RL, we then focus on the introduction of FRL as a new method with great potential by leveraging the basic idea of FL to improve the performance of RL while preserving data-privacy. According to the distribution characteristics of the agents in the framework, FRL algorithms can be divided into two categories, i.e. Horizontal Federated Reinforcement Learning (HFRL) and Vertical Federated Reinforcement Learning (VFRL). We provide the detailed definitions of each category by formulas, investigate the evolution of FRL from a technical perspective, and highlight its advantages over previous RL algorithms. In addition, the existing works on FRL are summarized by application fields, including edge computing, communication, control optimization, and attack detection. Finally, we describe and discuss several key research directions that are crucial to solving the open problems within FRL.

agent, algorithm, reinforcement, (15 more...)

arXiv.org Artificial Intelligence

2108.11887

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Power Industry (1.00)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Settling the Variance of Multi-Agent Policy Gradients

Kuba, Jakub Grudzien, Wen, Muning, Yang, Yaodong, Meng, Linghui, Gu, Shangding, Zhang, Haifeng, Mguni, David Henry, Wang, Jun

arXiv.org Artificial IntelligenceAug-20-2021

Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin.

baseline, estimator, variance, (15 more...)

arXiv.org Artificial Intelligence

2108.08612

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction

Zheng, Fang, Wang, Le, Zhou, Sanping, Tang, Wei, Niu, Zhenxing, Zheng, Nanning, Hua, Gang

arXiv.org Artificial IntelligenceAug-16-2021

Understanding complex social interactions among agents is a key challenge for trajectory prediction. Most existing methods consider the interactions between pairwise traffic agents or in a local area, while the nature of interactions is unlimited, involving an uncertain number of agents and non-local areas simultaneously. Besides, they treat heterogeneous traffic agents the same, namely those among agents of different categories, while neglecting people's diverse reaction patterns toward traffic agents in ifferent categories. To address these problems, we propose a simple yet effective Unlimited Neighborhood Interaction Network (UNIN), which predicts trajectories of heterogeneous agents in multiple categories. Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously, which is adaptive to any number of agents and any range of interaction area. Meanwhile, a hierarchical graph attention module is proposed to obtain category-to-category interaction and agent-to-agent interaction. Finally, parameters of a Gaussian Mixture Model are estimated for generating the future trajectories. Extensive experimental results on benchmark datasets demonstrate a significant performance improvement of our method over the state-of-the-art methods.

agent, interaction, prediction, (14 more...)

arXiv.org Artificial Intelligence

2108.00238

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Guangxi Province > Nanning (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)

Add feedback

Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable Grid Environments

Davydov, Vasilii, Skrynnik, Alexey, Yakovlev, Konstantin, Panov, Aleksandr I.

arXiv.org Artificial IntelligenceAug-13-2021

In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they, typically, rely on the full knowledge of the environment. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals. To tackle the challenge associated with learning cooperative behavior, i.e. in many cases agents need to yield to each other to accomplish a mission, we use a mixing Q-network that complements learning individual policies. In the experimental evaluation, we show that such approach leads to plausible results and scales well to large number of agents.

agent, algorithm, reinforcement, (15 more...)

arXiv.org Artificial Intelligence

2108.06148

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

The surprising effectiveness of PPO in cooperative multi-agent games

AIHubAug-6-2021, 12:00:00 GMT

Recent years have demonstrated the potential of deep multi-agent reinforcement learning (MARL) to train groups of AI agents that can collaborate to solve complex tasks – for instance, AlphaStar achieved professional-level performance in the Starcraft II video game, and OpenAI Five defeated the world champion in Dota2. These successes, however, were powered by huge swaths of computational resources; tens of thousands of CPUs, hundreds of GPUs, and even TPUs were used to collect and train on a large volume of data. This has motivated the academic MARL community to develop MARL methods which train more efficiently. DeepMind's AlphaStar attained professional level performance in StarCraft II, but required enormous amounts of computational power to train. Research in developing more efficient and effective MARL algorithms has focused on off-policy methods – which store and re-use data for multiple policy updates – rather than on-policy algorithms, which use newly collected training data before each update to the agents' policies.

artificial intelligence, deep learning, machine learning, (15 more...)

AIHub

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.41)

Add feedback

Model-Based Opponent Modeling

Yu, Xiaopeng, Jiang, Jiechuan, Jiang, Haobin, Lu, Zongqing

arXiv.org Artificial IntelligenceAug-4-2021

When one agent interacts with a multi-agent environment, it is challenging to deal with various opponents unseen before. Modeling the behaviors, goals, or beliefs of opponents could help the agent adjust its policy to adapt to different opponents. In addition, it is also important to consider opponents who are learning simultaneously or capable of reasoning. However, existing work usually tackles only one of the aforementioned types of opponent. In this paper, we propose model-based opponent modeling (MBOM), which employs the environment model to adapt to all kinds of opponent. MBOM simulates the recursive reasoning process in the environment model and imagines a set of improving opponent policies. To effectively and accurately represent the opponent policy, MBOM further mixes the imagined opponent policies according to the similarity with the real behaviors of opponents. Empirically, we show that MBOM achieves more effective adaptation than existing methods in competitive and cooperative environments, respectively with different types of opponent, i.e., fixed policy, na\"ive learner, and reasoning learner.

agent, international conference, opponent, (14 more...)

arXiv.org Artificial Intelligence

2108.01843

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

A purely data-driven framework for prediction, optimization, and control of networked processes: application to networked SIS epidemic model

Tavasoli, Ali, Henry, Teague, Shakeri, Heman

arXiv.org Artificial IntelligenceJul-31-2021

Networks are landmarks of many complex phenomena where interweaving interactions between different agents transform simple local rule-sets into nonlinear emergent behaviors. While some recent studies unveil associations between the network structure and the underlying dynamical process, identifying stochastic nonlinear dynamical processes continues to be an outstanding problem. Here we develop a simple data-driven framework based on operator-theoretic techniques to identify and control stochastic nonlinear dynamics taking place over large-scale networks. The proposed approach requires no prior knowledge of the network structure and identifies the underlying dynamics solely using a collection of two-step snapshots of the states. This data-driven system identification is achieved by using the Koopman operator to find a low dimensional representation of the dynamical patterns that evolve linearly. Further, we use the global linear Koopman model to solve critical control problems by applying to model predictive control (MPC)--typically, a challenging proposition when applied to large networks. We show that our proposed approach tackles this by converting the original nonlinear programming into a more tractable optimization problem that is both convex and with far fewer variables.

operator, optimization problem, upstream oil & gas, (20 more...)

arXiv.org Artificial Intelligence

2108.02005

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
Asia > Middle East > Iran (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning

Loftin, Robert, Saha, Aadirupa, Devlin, Sam, Hofmann, Katja

arXiv.org Artificial IntelligenceJul-30-2021

High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty can significantly improve the sample efficiency of RL in single agent tasks. This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings. We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play, as they can only be reached through cooperation between both players. To address this issue, we introduce a formal notion of strategically efficient exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games. We demonstrate that these methods can be significantly more sample efficient than their optimistic counterparts.

algorithm, exploration, strategic ulcb, (14 more...)

arXiv.org Artificial Intelligence

2107.14698

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training

Sharma, Piyush K., Fernandez, Rolando, Zaroukian, Erin, Dorothy, Michael, Basak, Anjon, Asher, Derrik E.

arXiv.org Artificial IntelligenceJul-29-2021

Much work has been dedicated to the exploration of Multi-Agent Reinforcement Learning (MARL) paradigms implementing a centralized learning with decentralized execution (CLDE) approach to achieve human-like collaboration in cooperative tasks. Here, we discuss variations of centralized training and describe a recent survey of algorithmic approaches. The goal is to explore how different implementations of information sharing mechanism in centralized learning may give rise to distinct group coordinated behaviors in multi-agent systems performing cooperative tasks.

agent, algorithm, learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1117/12.2585808

2107.14316

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Maryland > Prince George's County > Adelphi (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Overview (1.00)

Industry:

Government > Military > Army (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

Hambly, Ben, Xu, Renyuan, Yang, Huining

arXiv.org Machine LearningJul-27-2021

Policy optimization algorithms have achieved substantial empirical successes in addressing a variety of non-cooperative multi-agent problems, including self-driving vehicles [17], real-time bidding games [8], and optimal execution in financial markets [6]. However, there have been few results from a theoretical perspective showing why such a class of reinforcement learning algorithms performs well with the presence of competition among agents. As a starting point to tackle this challenging problem, we investigate linear-quadratic games (LQGs) which can be seen as a generalization of the linear-quadratic regulator (LQR) from a single agent to multiple agents. In an LQG, all agents jointly control a linear state process, which may be in high dimensions, where the control (or action) from each individual agent has a linear impact on the state process. Each agent optimizes a quadratic cost function which depends on the state process, the control from this agent and/or the controls from the opponents.

gradient method, lemma 3, natural policy gradient algorithm, (11 more...)

arXiv.org Machine Learning

2107.1309

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback