Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

Dec-23-2025, 20:59:09 GMT–Neural Information Processing Systems

We study the \emph{offline reinforcement learning} (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown \emph{Markov Decision Process} (MDP) using the data coming from a policy $\mu$. In particular, we consider the sample complexity problems of offline RL for the finite horizon MDPs. Prior works derive the information-theoretical lower bounds based on different data-coverage assumptions and their upper bounds are expressed by the covering coefficients which lack the explicit characterization of system quantities.

instance-optimal offline reinforcement learning, name change, proceedings, (9 more...)

Neural Information Processing Systems

Dec-23-2025, 20:59:09 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.58)