Supplement to " Model Selection in Contextual Stochastic Bandit Problems "
–Neural Information Processing Systems
In Section D we present the proofs for Section 5.1 In Section H we show the proofs of the lower bounds in Section 6. We outline briefly some other direct applications of our results. CORRAL will achieve regret O ( p | L | dT) . B.1 Original Corral The original Corral algorithm [2] is reproduced below. We reproduce the EXP3.P algorithm (Figure 3.1 in [ 's expected replay regret satisfies: Therefore total regret is bounded by 6 U ( T,) log( T) D.2 Applications of Proposition 5.1 We now show that several algorithms are ( U,, T) bounded: Lemma D.2.
Neural Information Processing Systems
Oct-3-2025, 06:29:16 GMT