A Unified Principle of Pessimism for Offline Reinforcement Learning under Model Mismatch

Neural Information Processing Systems 

To tackle these issues, we propose a unified principle of pessimism using distribu-tionally robust Markov decision processes.