Calibrated Reasoning: An Explanatory Verifier for Dynamic and Efficient Problem-Solving

Garg, Anisha, Tekin, Engin, More, Yash, Bick, David, Neema, Nishit, Venkatesh, Ganesh

Sep-25-2025–arXiv.org Artificial Intelligence

Advanced test-time computing strategies are essential for scaling reasoning models, but their effectiveness is capped by the models' poor self-evaluation. We propose a pairwise Explanatory Verifier, trained via reinforcement learning (GRPO), that produces calibrated confidence scores and associated natural language reasoning for generated solutions. Our verifier improves the accuracy and efficiency of test-time strategies like best-of-n and self-reflection. Crucially, it excels at identifying challenging failure modes, such as when both candidate solutions are identically incorrect, succeeding where standard methods like majority voting fail.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-25-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.72)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)