Finding good policies in average-reward Markov Decision Processes without prior knowledge

Open in new window