Appendix for " Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games " T able of Contents

Neural Information Processing Systems 

A.1 Proof of Theorem 1 To prove Theorem 1, we need the help of the following Lemma Lemma 1. See Proposition 7.1 in [3]. Now we can prove our Theorem 1. Proof. Therefore, the distribution of state-action is equivalent to the distribution of the action. A.3 Proof of Theorem 3 Now let us first restate the propositions. PE is equivalent to exploitability.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found