Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

Zhao, Kai, Ma, Yi, Hao, Jianye, Liu, Jinyi, Zheng, Yan, Meng, Zhaopeng

Dec-12-2023–arXiv.org Artificial Intelligence

Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called Ensemble-based Offline-to-Online (E2O) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that E2O can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion and navigation tasks, significantly outperforming existing offline-to-online RL methods.

environment step, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

Dec-12-2023

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Tianjin Province > Tianjin (0.04)
- North America > United States
  - Hawaii > Honolulu County > Honolulu (0.04)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Education (0.70)
- Leisure & Entertainment > Games (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)