AITopics | rorl

96bbdd0ed2a9e7cd2fb7caf2fae15f3d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 21:57:27 GMT

arxiv preprint arxiv, international conference, reinforcement, (11 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Neural Information Processing SystemsDec-24-2025, 20:27:43 GMT

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

A Theoretical Analysis

Neural Information Processing SystemsAug-22-2025, 01:03:20 GMT

In this section, we provide detailed theoretical analysis and proofs in linear MDPs [23]. A.1 LSVI Solution In linear MDPs, we assume that the transition dynamics and reward function take the form of P Theorem (Theorem 1 restate) . In experiments, we do not use explicit constraints (e.g., Spectral regularization) for the upper bound Corollary (Corollary 1 restate) . I given in Corollary 1. To conclude, we obtain from Eq. (22) that |T V First, we give the following lemma.

artificial intelligence, machine learning, rorl, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.67)

Add feedback

96bbdd0ed2a9e7cd2fb7caf2fae15f3d-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 03:28:15 GMT

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Neural Information Processing SystemsJan-18-2025, 01:33:53 GMT

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs.

conservative smoothing, robust offline reinforcement learning, rorl, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Yang, Rui, Bai, Chenjia, Ma, Xiaoteng, Wang, Zhaoran, Zhang, Chongjie, Han, Lei

arXiv.org Artificial IntelligenceOct-22-2022

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2206.02829

Country: