Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning