Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality