Bellman-consistent Pessimism for Offline Reinforcement Learning

Neural Information Processing Systems 

The use of pessimism, when reasoning about datasets lacking exhaustive exploration, has recently gained prominence in offline reinforcement learning.