Goto

Collaborating Authors

 Uncertainty




A Proofs of the Main Results

Neural Information Processing Systems

Finally, let us consider (b), the general case. A.2 Proposition 2 Proof We will derive the gradients of the unnormalized posterior since In practice, we recommend the log-sum-exp trick for applying Proposition 2. Let us define Again, let us first consider case (a).






A Proofs

Neural Information Processing Systems

We lay out the proof in two major steps. From the Performance Difference Lemma B.2, we obtain J (q Combining with (A.4) gives us the iterative improvement bound as follows: J ( π From the Simulation Lemma B.1, we have the bound of We can further know from the construction of the confidence set (c.f. Similar with the proof in A.2, we obtain from the Simulation Lemma B.1 that Enull null null V The claim is thus established. This lemma is widely adopted in RL. Proof can be found in various previous works, e.g.