Supplementary Material T able of Contents

Neural Information Processing Systems 

Returning to the variational problem in Equation (A.5), we can now write D (by Lemma 2) Assume |A| < and that the MDP is ergodic. Parts of this proof are adapted from the proof given in Haarnoja et al. Convergence follows from Outcome-Driven Policy Evaluation above. We will use analogous notation for p . The result follows from Lemma 4, Equation (A.128), Equation (A.129), and the definition of f .

Similar Docs  Excel Report  more

TitleSimilaritySource
None found