Decentralized Learning in General-sum Markov Games

Maheshwari, Chinmay, Wu, Manxi, Sastry, Shankar

Sep-15-2024–arXiv.org Artificial Intelligence

The Markov game framework is widely used to model interactions among agents with heterogeneous utilities in dynamic, uncertain, societal-scale systems. In these settings, agents typically operate in a decentralized manner due to privacy and scalability concerns, often without knowledge of others' strategies. Designing decentralized learning algorithms that provably converge to rational outcomes remains challenging, especially beyond Markov zero-sum and potential games, which do not fully capture the mixed cooperative-competitive nature of real-world interactions. Our paper focuses on designing decentralized learning algorithms for general-sum Markov games, aiming to provide guarantees of convergence to approximate Nash equilibria. We introduce a Markov Near-Potential Function (MNPF), and show that MNPF plays a central role in the analysis of convergence of an actor-critic-based decentralized learning dynamics to approximate Nash equilibria. Our analysis leverages the two-timescale nature of actor-critic algorithms, where Q-function updates occur faster than policy updates. This result is further strengthened under certain regularity conditions and when the set of Nash equilibria is finite. Our findings provide a new perspective on the analysis of decentralized learning in multi-agent systems, addressing the complexities of real-world interactions.

algorithm, equilibrium, trajectory, (15 more...)

arXiv.org Artificial Intelligence

Sep-15-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Alameda County
    - Berkeley (0.04)
  - New York > Tompkins County
    - Ithaca (0.04)

Genre:
- Research Report (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found