Reinforcement Learning under Model Mismatch