Optimal Sample Complexity of Reinforcement Learning for Mixing Discounted Markov Decision Processes

Open in new window