OfflineReinforcementLearningwithReverse Model-basedImagination
–Neural Information Processing Systems
However, in many real-world applications, collecting sufficient exploratory interactions is usually impractical, because online datacollection canbecostlyorevendangerous, suchasinhealthcare [4]andautonomous driving [5]. To address this challenge, offline RL [6, 7] develops a new learning paradigm that trains RL agents only with pre-collected offline datasets and thus can abstract away from the cost of online exploration [8-17].
Neural Information Processing Systems
Feb-11-2026, 22:58:47 GMT