Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

Neural Information Processing Systems 

We consider gap-dependent regret bounds for episodic MDPs. We show that the Monotonic Value Propagation (MVP) algorithm (Zhang et al. [2024]) achieves a variance-aware gap-dependent regret bound of O ÿ

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found