Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Neural Information Processing Systems 

Our results show a significant polynomial in the number of episodes improvement over the state of the art.