DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Open in new window