Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

Neural Information Processing Systems 

We consider gap-dependent regret bounds for episodic MDPs.