Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes

Open in new window