Faster Non-asymptotic Convergence for Double Q-learning