Spurious Rewards: Rethinking Training Signals in RLVR

Open in new window