Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

Open in new window