Policy Optimization for Markov Games: Unified Framework and Faster Convergence
–Neural Information Processing Systems
We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate.
Neural Information Processing Systems
Dec-24-2025, 17:31:46 GMT
- Technology: