SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

Zhao, Xueliang, Zheng, Lin, Bo, Haige, Hu, Changran, Thakker, Urmish, Kong, Lingpeng

Aug-20-2024–arXiv.org Artificial Intelligence

Formal theorem proving, a field at the intersection of mathematics and computer science, has seen renewed interest with advancements in large language models (LLMs). This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment. SubgoalXL addresses two critical challenges: the scarcity of specialized mathematics and theorem-proving data, and the need for improved multi-step reasoning abilities in LLMs. Leveraging the Isabelle environment's advantages in subgoal-based proofs, SubgoalXL achieves a new state-of-the-art performance of 56.1% in Isabelle on the standard miniF2F dataset, marking an absolute improvement of 4.9%. Notably, SubgoalXL successfully solves 41 AMC12, 9 AIME, and 3 IMO problems from miniF2F. These results underscore the effectiveness of maximizing limited data utility and employing targeted guidance for complex reasoning in formal theorem proving, contributing to the ongoing advancement of AI reasoning capabilities. Formal theorem proving, a field at the intersection of mathematics and computer science, has flourished alongside the development of languages like Lean (de Moura et al., 2015) and Isabelle (Paulson, 1994). These two prominent communities have been instrumental in advancing the field's core challenge: mechanizing mathematical reasoning and proof verification (Li et al., 2020).

dataset, generator, informal proof, (13 more...)

arXiv.org Artificial Intelligence

Aug-20-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Germany
  - Berlin (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Indonesia > Java
    - Yogyakarta > Yogyakarta (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found