On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Neural Information Processing Systems 

We consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy.