Provably Efficient Q-Learning with Low Switching Cost
Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang
–Neural Information Processing Systems
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of local switching cost.
Neural Information Processing Systems
Jan-23-2025, 09:57:24 GMT