ds null dad
9 Appendix For all the following derivations, we use D
Based on Lemma2, we can derive the upper-bound of our original objective: Theorem 1 (Surrogate Objective as the Divergence Upper-bound) . We provide proof with a counter-example. Based on Assumption 1, we have the following: Corollary 1. A similar strategy is adopted by [3]. Sec 3.2 to learn a value function v (s,s