Learning and Planning in Average-Reward Markov Decision Processes