Variational Bayesian Reinforcement Learning with Regret Bounds