Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Paper Summary: This paper treats a general multi-armed bandit problem in which the mean reward of each arm depends on a common unknown parameter. The authors consider a simple modification of the UCB1 algorithm. They show, unsurprisingly, that the algorithm satisfies a regret bound like that of UCB1. The main improvement of this paper is to show when the optimal arm can be identified perfectly by samples of the optimal arm, algorithm's regret is bounded by a constant independent of the time horizon.