Provably (More) Sample-Efficient Offline RL with Options

Jan-20-2025, 00:02:45 GMT–Neural Information Processing Systems

Recent works show that options help improve the sample efficiency in online RL. However, these results are no longer applicable to scenarios where exploring the environment online is risky, e.g., automated driving and healthcare. In this paper, we provide the first analysis of the sample complexity for offline RL with options, where the agent learns from a dataset without further interaction with the environment. We derive a novel information-theoretic lower bound, which generalizes the one for offline learning with actions. We propose the PEssimistic Value Iteration for Learning with Options (PEVIO) algorithm and establish near-optimal suboptimality bounds for two popular data-collection procedures, where the first one collects state-option transitions and the second one collects state-action transitions.

data-collection procedure, provably, sample-efficient offline rl

Neural Information Processing Systems

Jan-20-2025, 00:02:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.84)