Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition

Open in new window