Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs