Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality

Open in new window