Supplement to " Maximum Average Randomly Sampled: A Scale Free and Non-parametric Algorithm for Stochastic Bandits "
–Neural Information Processing Systems
The following lemma given in [2] is useful for the proof of Theorem 1. Lemma 1. [2] Given a stochastic matrix H = 0 0 0 h The following propositions are used to prove this theorem. In this case, there is not enough observations to achieve an upper confidence bound using Proposition 2. The randomized UCB for this case has also an exact confidence as illustrated below: Pr{UCB In the second equality, the boundedness of the means of the arms and Proposition 1 were utilized. The steps in this proof closely follows the proof of Theorem 7.1 in [3]. Let us define a'good' event as G We are going to show 1. The next step is to bound the probability of the second set in (3).
Neural Information Processing Systems
Nov-20-2025, 21:32:31 GMT
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Oceania > Australia
- Europe > United Kingdom
- Technology: