A Proofs
–Neural Information Processing Systems
We lay out the proof in two major steps. From the Performance Difference Lemma B.2, we obtain J (q Combining with (A.4) gives us the iterative improvement bound as follows: J ( π From the Simulation Lemma B.1, we have the bound of We can further know from the construction of the confidence set (c.f. Similar with the proof in A.2, we obtain from the Simulation Lemma B.1 that Enull null null V The claim is thus established. This lemma is widely adopted in RL. Proof can be found in various previous works, e.g.
Neural Information Processing Systems
Aug-17-2025, 08:37:16 GMT
- Technology: