Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

Apr-25-2026, 01:53:36 GMT–Neural Information Processing Systems

We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy µ. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Apr-25-2026, 01:53:36 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.68)

Duplicate Docs Excel Report

Title
212ab20dbdf4191cbcdcf015511783f4-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found