Reviews: Non-Asymptotic Pure Exploration by Solving Games

Neural Information Processing Systems 

This work studies the complexity of pure exploration when the confidence level is a constant. The approach is to solve the minimax optimization problem in the lower bound by viewing it as a two-player game and defining the players' learning dynamics. The authors also discuss a possible trade-off between the power of optimization oracles and the sample complexity guarantee. This seems to me a novel contribution to the field. I was able to follow the proof sketches in Section 3 and the technical claims in the paper seem valid to me.