Doubly Mild Generalization for Offline Reinforcement Learning Yixiu Mao 1, Qi Wang 1, Y un Qu

Neural Information Processing Systems 

Offline Reinforcement Learning (RL) suffers from the extrapolation error and value overestimation. From a generalization perspective, this issue can be attributed to the over-generalization of value functions or policies towards out-of-distribution (OOD) actions.