Double Q-learning for Value-based Deep Reinforcement Learning, Revisited