Supplementary Materials for " Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity " A Proofs of the Main Results
–Neural Information Processing Systems
We first introduce some additional notations for convenience. Our proof mainly consists of the following steps: 1. Helper lemmas and a crude bound. See A.2, and more precisely, Lemmas A.9 and A.10. 3. Final bound for null -approximate NE value. See A.3. 4. Final bounds for null -NE policy. See A.5. 14 A.1 Important Lemmas We start with the component-wise error bounds.
Neural Information Processing Systems
Feb-7-2026, 11:13:36 GMT