Kernelized Reinforcement Learning with Order Optimal Regret Bounds