Batch Policy Learning in Average Reward Markov Decision Processes

Open in new window