Online learning in episodic Markovian decision processes by relative entropy policy search

Open in new window