Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

Open in new window