On the convergence of policy gradient methods to Nash equilibria in general stochastic games Anonymous Author(s) Affiliation Address email

Apr-25-2026, 07:56:15 GMT–Neural Information Processing Systems

Multi-agent learning in stochastic N-player games is a notoriously difficult problem1 because, in addition to their changing strategic decisions, the players of the game2 must also contend with the fact that the game itself evolves over time, possibly in a3 very complicated manner. Because of this, the equilibrium convergence properties4 of popular learning algorithms - like policy gradient and its variants - are poorly5 understood, except in specific classes of games (such as potential or two-player,6 zero-sum games). In view of all this, we examine the long-run behavior of policy7 gradient methods with respect to Nash equilibrium policies that are second-order8 stationary (SOS) in a sense similar to the type of KKT sufficiency conditions9 used in optimization. Our analysis shows that SOS policies are locally attracting10 with high probability, and we show that policy gradient trajectories with gradient11 estimates provided by the Reinforcealgorithm achieve an O(1/ n) convergence12 rate to such equilibria if the method's step-size is chosen appropriately.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Apr-25-2026, 07:56:15 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (1.00)

Industry:
- Leisure & Entertainment > Games (0.46)

Technology:
- Information Technology
  - Game Theory (1.00)
  - Artificial Intelligence
    - Machine Learning > Reinforcement Learning (1.00)
    - Representation & Reasoning > Agents (0.88)

Duplicate Docs Excel Report

Title
2f060912eacace9ce61ef339205ec54c-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found