Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Open in new window