Appendix for " Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games " Table of Contents

Neural Information Processing Systems 

A.1 Proof of Theorem 1 To prove Theorem 1, we need the help of the following Lemma See Proposition 7.1 in [3]. Now we can prove our Theorem 1. Proof. For games with only one step (normal-form games, functional-form games), there is only one fixed state. Therefore, the distribution of state-action is equivalent to the distribution of the action. A.2 Proof of Theorem 2 Let us restate our Theorem 2 Theorem 2. For a given empirical payoff matrix A RM N and the reward vector aM+1 for policy M + ||(I A>(A>))aM+1||2, (18) where (A>) is the Moore-Penrose pseudoinverse of A>, and σmin(A) is the minimum singular value of A. Proof. The last equation comes from the analytic calculation of min1>β=1 ||β (A>) aM+1||2 using Lagrangian.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found