Variational Regret Bounds for Reinforcement Learning