Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

Open in new window