Decentralized Learning in General-sum Markov Games
Maheshwari, Chinmay, Wu, Manxi, Sastry, Shankar
–arXiv.org Artificial Intelligence
The Markov game framework is widely used to model interactions among agents with heterogeneous utilities in dynamic, uncertain, societal-scale systems. In these settings, agents typically operate in a decentralized manner due to privacy and scalability concerns, often without knowledge of others' strategies. Designing decentralized learning algorithms that provably converge to rational outcomes remains challenging, especially beyond Markov zero-sum and potential games, which do not fully capture the mixed cooperative-competitive nature of real-world interactions. Our paper focuses on designing decentralized learning algorithms for general-sum Markov games, aiming to provide guarantees of convergence to approximate Nash equilibria. We introduce a Markov Near-Potential Function (MNPF), and show that MNPF plays a central role in the analysis of convergence of an actor-critic-based decentralized learning dynamics to approximate Nash equilibria. Our analysis leverages the two-timescale nature of actor-critic algorithms, where Q-function updates occur faster than policy updates. This result is further strengthened under certain regularity conditions and when the set of Nash equilibria is finite. Our findings provide a new perspective on the analysis of decentralized learning in multi-agent systems, addressing the complexities of real-world interactions.
arXiv.org Artificial Intelligence
Sep-15-2024
- Country:
- North America > United States
- California > Alameda County
- Berkeley (0.04)
- New York > Tompkins County
- Ithaca (0.04)
- California > Alameda County
- North America > United States
- Genre:
- Research Report (0.70)
- Technology: