Online Learning in Episodic Markovian Decision Processes by Relative Entropy Policy Search

Open in new window