Policy Certificates and Minimax-Optimal PAC Bounds for Episodic Reinforcement Learning

Open in new window