Policy Optimization for Markov Games: Unified Framework and Faster Convergence Runyu Zhang Harvard University

Neural Information Processing Systems 

Policy optimization, i.e. algorithms that learn to make sequential decisions by local search on the agent's policy directly, is a widely used class of algorithms in reinforcement learning [