Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity

Open in new window