Reviews: Bootstrapping Upper Confidence Bound

Neural Information Processing Systems 

I should be acknowledged that it is significantly more complex that UCB1 for example. Indeed at each time step B bootstrap repetitions are needed to estimated the bootstrapped quantiles, and each of them require to drawn n_k random variables for each arm k (the values of w's). Also, this requires to store the past rewards obtained on all arms, which requires a lot a memory. This constraint is also needed for the empirical KL-UCB mentioned above, which is one more reason to compare the two algorithms that have similar complexity. From Theorem 2, I guess that the w's are Rademacher random variables, but it would be good to specify this in the statement of the algorithm.