Minimax Optimal Reinforcement Learning with Quasi-Optimism