Optimal Sample Complexity of Reinforcement Learning for Mixing Discounted Markov Decision Processes