Offline Reinforcement Learning with Behavioral Supervisor Tuning

Srinivasan, Padmanaba, Knottenbelt, William

Apr-25-2024–arXiv.org Artificial Intelligence

Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance, which requires policy rollouts in the environment to evaluate; this can rapidly become cumbersome. Furthermore, substantial tuning requirements can hamper the adoption of these algorithms in practical domains. In this paper, we present TD3 with Behavioral Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support. TD3-BST can learn more effective policies from offline datasets compared to previous methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning.

dataset, morse network, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

Apr-25-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found