Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs
–Neural Information Processing Systems
We consider gap-dependent regret bounds for episodic MDPs. We show that the Monotonic Value Propagation (MVP) algorithm (Zhang et al. [2024]) achieves a variance-aware gap-dependent regret bound of O ÿ
Neural Information Processing Systems
Jun-21-2026, 01:37:22 GMT