Supplementary Materials for " Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity " A Proofs of the Main Results

Neural Information Processing Systems 

We first introduce some additional notations for convenience. Our proof mainly consists of the following steps: 1. Helper lemmas and a crude bound. See A.2, and more precisely, Lemmas A.9 and A.10. 3. Final bound for null -approximate NE value. See A.3. 4. Final bounds for null -NE policy. See A.5. 14 A.1 Important Lemmas We start with the component-wise error bounds.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found