On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly Communicating MDPs

Open in new window