Finite-Time Analysis of Simultaneous Double Q-learning