Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs