Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning
Lecarpentier, Erwan, Rachelson, Emmanuel
–Neural Information Processing Systems
This work tackles the problem of robust zero-shot planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously with a bounded evolution rate; 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t.
Neural Information Processing Systems
Mar-18-2020, 23:31:31 GMT