AITopics | offline rl

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

Neural Information Processing SystemsApr-29-2026, 21:18:58 GMT

We study distributionally robust offline reinforcement learning (RL), which seeks to find an optimal robust policy purely from an offline dataset that can perform well in perturbed environments. We propose a generic algorithm framework Doubly Pessimistic Model-based Policy Optimization (P2MPO) for robust offline RL, which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. Here the double pessimism principle is crucial to overcome the distribution shift incurred by i) the mismatch between behavior policy and the family of target policies; and ii) the perturbation of the nominal model. Under certain accuracy assumptions on the model estimation subroutine, we show that P2MPOis provably sample-efficient with robust partial coverage data, which means that the offline dataset has good coverage of the distributions induced by the optimal robust policy and perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples including tabular Robust Markov Decision Process (RMDP), factored RMDP, and RMDP with kernel and neural function approximations, we show that P2MPO enjoys a eO(n 1/2) convergence rate, where nis the number of trajectories in the offline dataset. Notably, these models, except for the tabular case, are first identified and proven tractable by this paper. To the best of our knowledge, we first propose a general learning principle -- double pessimism -- for robust offline RL and show that it is provably efficient in the context of general function approximations.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

ade04fd4f26263f86b47ffb535c4cafb-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 10:08:33 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre: Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

3f24bb08a5741e4197af64e1f93a5029-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 14:22:23 GMT

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Add feedback

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Neural Information Processing SystemsApr-25-2026, 13:16:26 GMT

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a previously collected static dataset, bears algorithmic difficulties due to function approximation errors from out-of-distribution (OOD) data points. To this end, offline RL algorithms adopt either a constraint or a penalty term that explicitly guides the policy to stay close to the given dataset. However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem. Moreover, these methods under-utilize the generalization ability of deep neural networks and often fall into suboptimal solutions too close to the given dataset. In this work, we propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the data distribution. We show that the clipped Q-learning, a technique widely used in online RL, can be leveraged to successfully penalize OOD data points with high prediction uncertainties. Surprisingly, we find that it is possible to substantially outperform existing offline RL methods on various tasks by simply increasing the number of Q-networks along with the clipped Q-learning. Based on this observation, we propose an ensemble-diversified actor-critic algorithm that reduces the number of required ensemble networks down to a tenth compared to the naive ensemble while achieving state-of-the-art performance on most of the D4RL benchmarks considered.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

Neural Information Processing SystemsApr-25-2026, 01:53:36 GMT

We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy µ. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

121db870b0470dd63bb5bc59c724275a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:34:10 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

11e1900e680f5fe1893a8e27362dbe2c-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:03:14 GMT

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.95)
(2 more...)

Add feedback

07d5938693cc3903b261e1a3844590ed-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 13:11:19 GMT

machine learning, probability 1, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

0429ececfb199efc93182990169e73bb-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:50:51 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs

Neural Information Processing SystemsMar-22-2026, 16:05:41 GMT

Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022) is whether hybrid RL can improve upon the existing lower bounds established in purely offline and purely online RL without relying on the single-policy concentrability assumption. While Li et al. (2023) provided an affirmative answer to this question in the tabular PAC RL case, the question remains unsettled for both the regret-minimizing RL case and the non-tabular case. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both PAC and regret-minimizing RL with linear function approximation, without requiring concentrability on the entire state-action space. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for PAC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy. To our knowledge, this work establishes the tightest theoretical guarantees currently available for hybrid RL in linear MDPs.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Filters

Collaborating Authors

offline rl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

ade04fd4f26263f86b47ffb535c4cafb-Paper-Conference.pdf

3f24bb08a5741e4197af64e1f93a5029-Paper.pdf

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

121db870b0470dd63bb5bc59c724275a-Paper-Conference.pdf

11e1900e680f5fe1893a8e27362dbe2c-Paper-Conference.pdf

07d5938693cc3903b261e1a3844590ed-Supplemental.pdf

0429ececfb199efc93182990169e73bb-Paper-Conference.pdf

Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs