Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection

Jun-23-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive mathematical reasoning capabilities but remain susceptible to hallucinations--plausible yet incorrect statements--particularly in complex domains requiring rigorous logical deduction. Current approaches to improve reliability often neglect the logical consistency of intermediate reasoning steps, focusing primarily on final answer verification. We propose a structured self-consistency (SC) framework that systematically evaluates factual concordance across both intermediate reasoning steps and final outputs, thereby creating a hierarchical verification mechanism for mathematical reasoning. Our framework employs a probabilistic formulation that quantifies consistency through ensemble agreement, entropy minimization, and structural isomorphism detection in reasoning graphs. We evaluate our approach on three fundamental mathematical tasks: formal theorem proving, symbolic transformation, and numerical computation. Experimental results demonstrate that our method achieves significant improvements over baseline approaches: proof validity increases by 8.3% ( p < 0. 01), symbolic reasoning accuracy by 9.6%, and numerical stability by 42.8% while reducing computational overhead by 56.3%. Further analysis reveals that our structured SC framework exhibits strong correlation with human expert evaluation ( ρ = 0 .87),

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-23-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - New Finding (0.68)
  - Experimental Study (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Cognitive Science > Problem Solving (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found