Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation

Open in new window