Offline Reinforcement Learning with Behavioral Supervisor Tuning
Srinivasan, Padmanaba, Knottenbelt, William
–arXiv.org Artificial Intelligence
Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance, which requires policy rollouts in the environment to evaluate; this can rapidly become cumbersome. Furthermore, substantial tuning requirements can hamper the adoption of these algorithms in practical domains. In this paper, we present TD3 with Behavioral Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support. TD3-BST can learn more effective policies from offline datasets compared to previous methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning.
arXiv.org Artificial Intelligence
Apr-25-2024
- Country:
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.40)
- Industry:
- Health & Medicine (0.68)
- Technology: