Finite-TimeAnalysisofRound-Robin Kullback-LeiblerUpperConfidenceBoundsfor OptimalAdaptiveAllocationwithMultiplePlaysand MarkovianRewards