Closing the Computational-Statistical Gap in Best Arm Identification for Combinatorial Semi-bandits

Neural Information Processing Systems 

An efficient method to design statistically optimal algorithms solving active learning tasks (e.g., regret minimization or pure exploration in bandits and reinforcement learning) consists in the following two-step procedure.