marl
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
There has been a resurgence of interest in multiagent reinforcement learning (MARL), due partly to the recent success of deep neural networks. The simplest form of MARL is independent reinforcement learning (InRL), where each agent treats all of its experience as part of its (non stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe a meta-algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game theoretic analysis to compute meta-strategies for policy selection.
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems.
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Republic of Türkiye > Manisa Province > Manisa (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Iowa > Story County > Ames (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Overview (0.67)
- Research Report (0.46)
- Energy > Power Industry (0.67)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Communications > Networks (0.93)
- Information Technology > Artificial Intelligence > Robots (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
- North America > United States > Illinois (0.04)
- North America > Canada (0.04)
- Information Technology > Robotics & Automation (0.46)
- Government (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > United States > Illinois (0.04)
- (3 more...)
- Asia > Middle East > Jordan (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Europe > United Kingdom (0.14)
- Asia > India (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
- Energy > Power Industry (1.00)
- Energy > Renewable > Solar (0.68)