Towards Instance-Optimal Offline Reinforcement Learning with Pessimism Ming Yin 1,2 and Y u-Xiang Wang 1 1 Department of Computer Science, UC Santa Barbara
–Neural Information Processing Systems
Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities.
Neural Information Processing Systems
Oct-2-2025, 20:03:54 GMT