AITopics | lean 3

Collaborating Authors

lean 3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Neural Information Processing SystemsOct-9-2025, 19:17:38 GMT

Automating mathematical reasoning is a longstanding goal in artificial intelligence (Newell et al., 1957). A prominent line of work on the problem (Li et al., 2024) uses neural models to direct

benchmark, formalization, lean 4, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.67)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Tsoukalas, George, Lee, Jasper, Jennings, John, Xin, Jimmy, Ding, Michelle, Jennings, Michael, Thakur, Amitayush, Chaudhuri, Swarat

arXiv.org Artificial IntelligenceJul-15-2024

We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.

formalization, isabelle, lean 4, (14 more...)

arXiv.org Artificial Intelligence

2407.11214

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

Huang, Yinya, Lin, Xiaohan, Liu, Zhengying, Cao, Qingxing, Xin, Huajian, Wang, Haiming, Li, Zhenguo, Song, Linqi, Liang, Xiaodan

arXiv.org Artificial IntelligenceMay-22-2024

Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving. As these two tasks require strict and formal multi-step inference, they are appealing domains for exploring the reasoning ability of LLMs but still face important challenges. Previous studies such as Chain-of-Thought (CoT) have revealed the effectiveness of intermediate steps guidance. However, such step-wise annotation requires heavy labor, leading to insufficient training steps for current benchmarks. To fill this gap, this work introduces MUSTARD, a data generation framework that masters uniform synthesis of theorem and proof data of high quality and diversity. MUSTARD synthesizes data in three stages: (1) It samples a few mathematical concept seeds as the problem category. (2) Then, it prompts a generative language model with the sampled concepts to obtain both the problems and their step-wise formal solutions. (3) Lastly, the framework utilizes a proof assistant (e.g., Lean Prover) to filter the valid proofs. With the proposed MUSTARD, we present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. Each data point contains an informal statement, an informal proof, and a translated formal proof that passes the prover validation. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data. We further apply the MUSTARDSAUCE for fine-tuning smaller language models. The fine-tuned Llama 2-7B achieves a 15.41% average relative performance gain in automated theorem proving, and 8.18% in math word problems. Codes and data are available at https://github.com/Eleanor-H/MUSTARD.

conference paper, equation, language model, (14 more...)

arXiv.org Artificial Intelligence

2402.08957

Country:

Africa > Rwanda > Kigali > Kigali (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)
(8 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education > Educational Setting > K-12 Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages

Mercer, Johnathan

arXiv.org Artificial IntelligenceFeb-12-2024

Formal mathematics is the discipline of translating mathematics into a programming language in which any statement can be unequivocally checked by a computer. Mathematicians and computer scientists have spent decades of painstaking formalization efforts developing languages such as Coq, HOL, and Lean. Machine learning research has converged on these formal math corpora and given rise to an assortment of methodologies to aid in interactive and automated theorem proving. However, these papers have primarily focused on one method, for one proof task, in one language. This paper introduces EvoGPT-f: a novel evolutionary framework for the first systematic quantitative analysis of the differential machine learnability of five formal math corpora (Lean 3, Lean 4, Coq, HOL 4, HOL Light) using four tokenization methods (character, word-level, Byte Pair Encoding and StarCoder tokenizer). This paper does not put to rest the question of the "best" or "easiest" language to learn. Rather, this framework and preliminary findings begin to illuminate the differential machine learnability of these languages, offering a foundation to forge more systematic quantitative and qualitative comparative research across communities.

character lean starcoder lean 4, lean 3, lean 4, (8 more...)

arXiv.org Artificial Intelligence

2402.16878

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > North Carolina > Wake County > Morrisville (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.51)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Add feedback