Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

Open in new window