Is Q-Learning Provably Efficient?