Optimal Best Markovian Arm Identification with Fixed Confidence

Neural Information Processing Systems 

For the analysis of the Track-and-Stop strategy we derive a novel concentration inequality for Markov chains that may be of interest in its own right.