1n logE h
–Neural Information Processing Systems
Lemma 2 (Chernoff bound for irreducible Markov chains). The proof is based on the argument given in Appendix A.2 of [7], adapted though for the case of Markov chains. We start the analysis by establishing the relation between the expected regret, Equation 1, and its proxy,Equation17. For the first part, we show in Appendix C that the expected number of times that an arma {1,...,N}hasn'tbeenplayed,isoftheorderofO(loglogT). Assume that the one-parameter family of Markov chains on the finite state space S, together with the reward functionf: S R, satisfy conditions (18), (19), (20), (21), and (22).
Neural Information Processing Systems
Feb-8-2026, 12:46:43 GMT
- Technology: