On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Open in new window