Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Hu, Hao, Yang, Yiqin, Ye, Jianing, Wu, Chengjie, Mai, Ziqing, Hu, Yujing, Lv, Tangjie, Fan, Changjie, Zhao, Qianchuan, Zhang, Chongjie

May-31-2024–arXiv.org Artificial Intelligence

Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.

algorithm, bayesian design principle, boorl, (13 more...)

arXiv.org Artificial Intelligence

May-31-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe > Austria
  - Vienna (0.14)
- Asia
  - Middle East > Jordan (0.04)
  - China > Zhejiang Province
    - Hangzhou (0.04)

Genre:
- Research Report (0.82)
- Instructional Material > Online (0.41)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found