Near-OptimalRegretBoundsforMulti-batch ReinforcementLearning

Open in new window