Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

Wang, Shenzhi, Yang, Qisen, Gao, Jiawei, Lin, Matthieu Gaetan, Chen, Hao, Wu, Liwei, Jia, Ning, Song, Shiji, Huang, Gao

Oct-30-2023–arXiv.org Artificial Intelligence

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states. To this end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O utilizes a universal model to train a family of policies with different improvement/constraint intensities, and a balance model to select a suitable policy for each state. Theoretically, we prove that state-adaptive balances are necessary for achieving a higher policy performance upper bound. Empirically, extensive experiments show that FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark.

balance coefficient, equation, famo2o, (12 more...)

arXiv.org Artificial Intelligence

Oct-30-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe > Switzerland
  - Basel-City > Basel (0.04)
- Asia > China
  - Guangdong Province (0.04)
  - Beijing > Beijing (0.04)

Genre:
- Research Report (1.00)
- Instructional Material > Online (0.61)

Industry:
- Education > Educational Setting > Online (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found