IMED-RL: Regret optimal learning of ergodic Markov decision processes