Solving Zero-Sum Convex Markov Games

Kalogiannis, Fivos, Vlatakis-Gkaragkounis, Emmanouil-Vasileios, Gemp, Ian, Piliouras, Georgios

Jun-23-2025–arXiv.org Artificial Intelligence

We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity, the absence of Bellman consistency, and the complexity of the infinite horizon. We follow a two-step approach. First, leveraging properties of hidden-convex--hidden-concave functions, we show that a simple nonconvex regularization transforms the min-max optimization problem into a nonconvex-proximal Polyak-Lojasiewicz (NC-pPL) objective. Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria. Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Jun-23-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas (0.04)
  - Wisconsin > Dane County
    - Madison (0.14)
  - New York > New York County
    - New York City (0.04)
  - Illinois > Cook County
    - Chicago (0.04)
  - California > San Diego County
    - San Diego (0.04)
    - La Jolla (0.04)
- Europe
  - Italy (0.04)
  - Greece (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China (0.04)
  - Japan > Honshū
    - Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)

Genre:
- Research Report (0.81)
- Overview (0.67)

Industry:
- Leisure & Entertainment > Games (1.00)
- Energy (0.76)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning
    - Optimization (1.00)
    - Agents (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning > Gradient Descent (0.49)
    - Neural Networks > Deep Learning (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found