Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Jeong, Jihwan, Wang, Xiaoyu, Gimelfarb, Michael, Kim, Hyunwoo, Abdulhai, Baher, Sanner, Scott

Mar-3-2023–arXiv.org Artificial Intelligence

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since they can extract more learning signals from the logged dataset by learning a model of the environment. However, the performance of existing model-based approaches falls short of model-free counterparts, due to the compounding of estimation errors in the learned model. Driven by this observation, we argue that it is critical for a model-based method to understand when to trust the model and when to rely on model-free estimates, and how to act conservatively w.r.t. both. To this end, we derive an elegant and simple methodology called conservative Bayesian model-based value expansion for offline policy optimization (CBOP), that trades off model-free and model-based estimates during the policy evaluation step according to their epistemic uncertainties, and facilitates conservatism by taking a lower bound on the Bayesian posterior value estimate. On the standard D4RL continuous control tasks, we find that our method significantly outperforms previous model-based approaches: e.g., MOPO by $116.4$%, MOReL by $23.2$% and COMBO by $23.7$%. Further, CBOP achieves state-of-the-art performance on $11$ out of $18$ benchmark datasets while doing on par on the remaining datasets.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Mar-3-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - New York (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
  - Canada > Ontario
    - Toronto (0.14)
- Europe > Sweden
  - Stockholm > Stockholm (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.84)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found