Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Open in new window