On the Estimation Bias in Double Q-Learning