Learning Kernel-Based MDPs from Episodic Preferential Feedback