Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

Open in new window