On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
–Neural Information Processing Systems
We consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy.
Neural Information Processing Systems
Mar-14-2024, 11:54:56 GMT