Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited