Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning Yuan Zhou; Xiangyang Ji