On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

Jan-22-2025, 04:38:54 GMT–Neural Information Processing Systems

Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making it difficult for gradient-based algorithms in seeking NEs. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without \log -barrier regularization.

decentralized softmax gradient play, global convergence rate, markov potential game, (2 more...)

Neural Information Processing Systems

Jan-22-2025, 04:38:54 GMT

Conferences Web Page

Add feedback

Genre:
- Play > Prospect > Container > Trap (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Machine Learning > Statistical Learning (1.00)