Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Open in new window