Supplementary Material T able of Contents
–Neural Information Processing Systems
Returning to the variational problem in Equation (A.5), we can now write D (by Lemma 2) Assume |A| < and that the MDP is ergodic. Parts of this proof are adapted from the proof given in Haarnoja et al. Convergence follows from Outcome-Driven Policy Evaluation above. We will use analogous notation for p . The result follows from Lemma 4, Equation (A.128), Equation (A.129), and the definition of f .
Neural Information Processing Systems
Aug-15-2025, 01:03:24 GMT
- Technology: