Reinforcement Learning in Reward-Mixing MDPs