On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

Open in new window