Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach