Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Ganguly, Debargha, Singh, Vikash, Sankar, Sreehari, Zhang, Biyao, Zhang, Xuecen, Iyengar, Srinivasan, Han, Xiaotian, Sharma, Amit, Kalyanaraman, Shivkumar, Chaudhary, Vipin

May-27-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) show remarkable promise for democratizing automated reasoning by generating formal specifications. However, a fundamental tension exists: LLMs are probabilistic, while formal verification demands deterministic guarantees. This paper addresses this epistemological gap by comprehensively investigating failure modes and uncertainty quantification (UQ) in LLM-generated formal artifacts. Our systematic evaluation of five frontier LLMs reveals Satisfiability Modulo Theories (SMT) based autoformalization's domain-specific impact on accuracy (from +34.8% on logical tasks to -44.5% on factual ones), with known UQ techniques like the entropy of token probabilities failing to identify these errors. We introduce a probabilistic context-free grammar (PCFG) framework to model LLM outputs, yielding a refined uncertainty taxonomy. We find uncertainty signals are task-dependent (e.g., grammar entropy for logic, AUROC>0.93). Finally, a lightweight fusion of these signals enables selective verification, drastically reducing errors (14-100%) with minimal abstention, transforming LLM-driven formalization into a reliable engineering discipline.

large language model, logic & formal reasoning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

May-27-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East (0.28)

Genre:
- Research Report > New Finding (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Logic & Formal Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.94)
    - Performance Analysis > Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found