Supplement to " Maximum Average Randomly Sampled: A Scale Free and Non-parametric Algorithm for Stochastic Bandits "

Neural Information Processing Systems 

The following lemma given in [2] is useful for the proof of Theorem 1. Lemma 1. [2] Given a stochastic matrix H = 0 0 0 h The following propositions are used to prove this theorem. In this case, there is not enough observations to achieve an upper confidence bound using Proposition 2. The randomized UCB for this case has also an exact confidence as illustrated below: Pr{UCB In the second equality, the boundedness of the means of the arms and Proposition 1 were utilized. The steps in this proof closely follows the proof of Theorem 7.1 in [3]. Let us define a'good' event as G We are going to show 1. The next step is to bound the probability of the second set in (3).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found