Is Q-Learning Provably Efficient? An Extended Analysis