On learning history based policies for controlling Markov decision processes

Open in new window