On the consistency of hyper-parameter selection in value-based deep reinforcement learning