Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning

Open in new window