Near-OptimalRegretBoundsforMulti-batch ReinforcementLearning