Logarithmic Regret of Exploration in Average Reward Markov Decision Processes

Open in new window