Reviews: Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning

Neural Information Processing Systems 

UPDATE: I have read the authors response and increased my score. Specifically, the authors fixed my understanding of Property 1 and properly framed the relaxation of the problem in Section 5. Please include similar clarifications in the final work. There was also a lot of discussion among the reviewers about how the paper relates to the Robust MDP literature, which needs to be covered better in the current work. Papers such as "Reinforcement Learning in Robust Markov Decision Processes" and "Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions" were brought up by others and seem applicable in the current setting and could be empirical competitors to RATS. I very much like the constraints used to study planning in non-stationary environments in this paper and the min-max inspired RATS algorithm seems like an appropriate game theoretic approach.