Learning and Planning in Average-Reward Markov Decision Processes

Open in new window