On the convergence of policy gradient methods to Nash equilibria in general stochastic games Anonymous Author(s) Affiliation Address email
–Neural Information Processing Systems
Multi-agent learning in stochastic N-player games is a notoriously difficult problem1 because, in addition to their changing strategic decisions, the players of the game2 must also contend with the fact that the game itself evolves over time, possibly in a3 very complicated manner. Because of this, the equilibrium convergence properties4 of popular learning algorithms - like policy gradient and its variants - are poorly5 understood, except in specific classes of games (such as potential or two-player,6 zero-sum games). In view of all this, we examine the long-run behavior of policy7 gradient methods with respect to Nash equilibrium policies that are second-order8 stationary (SOS) in a sense similar to the type of KKT sufficiency conditions9 used in optimization. Our analysis shows that SOS policies are locally attracting10 with high probability, and we show that policy gradient trajectories with gradient11 estimates provided by the Reinforcealgorithm achieve an O(1/ n) convergence12 rate to such equilibria if the method's step-size is chosen appropriately.
Neural Information Processing Systems
Apr-25-2026, 07:56:15 GMT
- Country:
- North America > United States (1.00)
- Industry:
- Leisure & Entertainment > Games (0.46)
- Technology: