Appendices A Proof for Theorem 1 Before proceeding, let us define an additional term S = null
–Neural Information Processing Systems
Note that trivially, we have S KM . 's are generated under policy We now construct a high-probability confidence set. This lemma is based on Lemma 17 of Jaksch et al. [2010], which is based on the following From Theorem 2.1 in Weissman et al. [2003], for any null > 0, we have P {nullp () ˆ p()null On the other hand, based on the Hoeffding's inequality, if we choose "constant-shift" property of DP operator, and (c) follows from
Neural Information Processing Systems
Feb-8-2026, 08:16:30 GMT
- Technology: