Information-Theoretic Confidence Bounds for Reinforcement Learning