ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction

Jun-2-2025–arXiv.org Artificial Intelligence

Table I presents the quantitative evaluation of ProofNet++ across three distinct datasets. The FPSR (Final Proof Success Rate) metric shows that the system performs best on the mathlib-extract dataset with a 74.9% success rate, followed by miniF2F at 68.4%, and the HOL Light Testbed trailing at 63.5%. Similarly, the PPC (Proof Production Correctness) values align with this trend, indicating higher intermediate proof accuracy on mathlib-extract (88.0%) compared to the other datasets. The EDPT (Edit Distance to Proof Target) metric reveals that mathlib-extract proofs require fewer correction steps (2.4) than miniF2F (3.2) and HOL Light (4.0), suggesting that the system is more efficient in approximating correct proofs in that domain. Latency measurements reflect verifier runtime, with mathlib-extract exhibiting the fastest average verification time (176 ms), whereas HOL Light has the highest latency (214 ms). Lastly, the average proof length varies notably, with HOL Light proofs being the longest (14.3 steps), potentially contributing to its higher latency and lower success metrics. These results indicate that while ProofNet++ demonstrates strong performance on established libraries like mathlib-extract, there is room for improvement on datasets with more complex or longer proofs, such as HOL Light. Enhancements could focus on optimizing proof search strategies and reducing verifier latency, particularly for longer proofs, to improve overall efficiency and success rates. E. Benchmark Pipeline Overview Figure 1 illustrates the full evaluation pipeline used to benchmark ProofNet++, from the initial input prompt to the final corrected proof output.

large language model, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jun-2-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Logic & Formal Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found