Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards

Open in new window