Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games

Neural Information Processing Systems 

It is shown that, under mild technical assumptions and the introduction of the \textit{suboptimality gap}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an \epsilon -Nash Equilibrium (NE) within \mathcal{O}(1/\epsilon) iterations. This improves upon the previous best result of \mathcal{O}(1/\epsilon 2) iterations and is of the same order, \mathcal{O}(1/\epsilon), that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.