Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition