notation
Supplementary Materials for " Multi-Agent Meta-Reinforcement Learning " AT echnical Lemmas
From the three-points identity of the Bregman divergence (Lemma 3.1 of [9]), KL (x y) KL ( x y) = KL (x x) + ln x ln y,x x (12) The first term in (12) can be bounded by KL (x x) = By the Hölder's inequality, the second term in (12) is bounded as ln x ln y,x x ln x ln y Lemma 5. Consider a block diagonal matrix We prove the lemma via induction on N . This completes the induction proof.Lemma 6. We introduce one more notation before presenting the proof. This leads us to the initialization-dependent convergence rate of Algorithm 1, which we re-state and prove as follows. In addition, if we initialize the players' policies to be uniform policies, i.e., The rest of the proof follows by putting all the aforementioned results together.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (6 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Reviewer
Lower bound on regret: Assuming you mean Theorem 3 here - the theorem is correct as stated. We however use the correct defn. in all of our proofs. We mean Lipschitz continuity, as we want close-by models to imply the solution values are close. The use of this term is meant to follow the notation in Bottou et. It is defined in the formal statement of Theorem 2 (Theorem 5 in the appendix).