Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

Jan-19-2025, 15:57:05 GMT–Neural Information Processing Systems

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states.

offline-to-online reinforcement learning, state-adaptive balance, train once, (3 more...)

Neural Information Processing Systems

Jan-19-2025, 15:57:05 GMT

Conferences Web Page

Add feedback

Genre:
- Instructional Material > Online (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)