Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning

Open in new window