Calibrated Reasoning: An Explanatory Verifier for Dynamic and Efficient Problem-Solving
Garg, Anisha, Tekin, Engin, More, Yash, Bick, David, Neema, Nishit, Venkatesh, Ganesh
–arXiv.org Artificial Intelligence
Advanced test-time computing strategies are essential for scaling reasoning models, but their effectiveness is capped by the models' poor self-evaluation. We propose a pairwise Explanatory Verifier, trained via reinforcement learning (GRPO), that produces calibrated confidence scores and associated natural language reasoning for generated solutions. Our verifier improves the accuracy and efficiency of test-time strategies like best-of-n and self-reflection. Crucially, it excels at identifying challenging failure modes, such as when both candidate solutions are identically incorrect, succeeding where standard methods like majority voting fail.
arXiv.org Artificial Intelligence
Sep-25-2025
- Country:
- Europe
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Monaco (0.04)
- Italy > Calabria
- South America > Chile
- Europe
- Genre:
- Research Report (0.54)
- Technology: