Data- and Variance-dependent Regret Bounds for Online Tabular MDPs

Open in new window