Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Neural Information Processing Systems 

We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate.