Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Open in new window