Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning Yuan Zhou; Xiangyang Ji
–Neural Information Processing Systems
In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent is required to provide a time schedule to update policy before everything, which is particularly suitable for the scenarios where the agent suffers extensively from changing the policy adaptively.
Neural Information Processing Systems
Feb-9-2025, 14:16:36 GMT