Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes

Open in new window