Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration