mathematical statement
TopoAlign: A Framework for Aligning Code to Math via Topological Decomposition
Li, Yupei, Borchert, Philipp, Lampouras, Gerasimos
Large Language Models (LLMs) excel at both informal and formal (e.g. Lean 4) mathematical reasoning but still struggle with autoformalisation, the task of transforming informal into formal mathematical statements. Autoformalisation helps pair the informal reasoning of LLMs with formal proof assistants which enable machine-verifiable generation and mitigate hallucinations. Yet, the performance of current Math LLMs is constrained by the scarcity of large-scale corpora, particularly those containing pairs of informal and formal statements. Although current models are trained to generate code from natural language instructions, structural and syntactic differences between these and formal mathematics limit effective transfer learning. We propose TopoAlign, a framework that unlocks widely available code repositories as training resources for Math LLMs. TopoAlign decomposes code into docstrings, main functions, and dependency functions, and reassembles these components into analogues that structurally mirror formal statements. This produces structurally aligned code data that can be used for training Math LLMs without requiring additional human annotation. We train two state-of-the-art models, DeepSeek-Math and Herald, and evaluate them on the minif2f, Putnam, and ProofNet benchmarks. TopoAlign provides substantial gains for DeepSeek-Math, improving performance by 17.77% on BEq@10 and 68.82% on typecheck@10. Despite introducing no new mathematical knowledge, our framework achieves gains of 0.12% and 1.09% for Herald on BEq@10 and typecheck@10, respectively, demonstrating that training on aligned code data is beneficial even for specialized models.
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Rao, Balaji, Eiers, William, Lipizzi, Carlo
Formally verifying properties of software code has been a highly desirable task, especially with the emergence of LLM-generated code. In the same vein, they provide an interesting avenue for the exploration of formal verification and mechanistic interpretability. Since the introduction of code-specific models, despite their successes in generating code in Lean4 and Isabelle, the task of generalized theorem proving still remains far from being fully solved and will be a benchmark for reasoning capability in LLMs. In this work, we introduce a framework that generates whole proofs in a formal language to be used within systems that utilize the power of built-in tactics and off-the-shelf automated theorem provers. Our framework includes 3 components: generating natural language statements of the code to be verified, an LLM that generates formal proofs for the given statement, and a module employing heuristics for building the final proof. To train the LLM, we employ a 2-stage fine-tuning process, where we first use SFT-based training to enable the model to generate syntactically correct Isabelle code and then RL-based training that encourages the model to generate proofs verified by a theorem prover. We validate our framework using the miniF2F-test benchmark and the Isabelle proof assistant and design a use case to verify the correctness of the A WS S3 bucket access policy code.
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Peng, Zhongyuan, Yao, Yifan, Ma, Kaijing, Guo, Shuyue, Li, Yizhe, Zhang, Yichi, Zhang, Chenchen, Zhang, Yifan, Yu, Zhouliang, Li, Luming, Liu, Minghao, Xia, Yihang, Shen, Jiawei, Wu, Yuchen, Cao, Yixin, Zhang, Zhaoxiang, Huang, Wenhao, Liu, Jiaheng, Zhang, Ge
Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase-the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models' ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 285K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation. Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations, and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.
Formal Language Knowledge Corpus for Retrieval Augmented Generation
Abstract-The integration of retrieval-augmented techniques with LLMs has shown promise in improving performance across various domains. However, their utility in tasks requiring advanced reasoning, such as generating and evaluating mathematical statements and proofs, remains underexplored. This study explores the use of Lean, a programming language for writing mathematical proofs, to populate the knowledge corpus used by RAG systems. We hope for this to lay the foundation to exploring different methods of using RAGs to improve the performance of LLMs in advanced logical reasoning tasks. Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, but they still face significant challenges, particularly in generating accurate and reliable information. One of the key issues is their tendency to produce hallucinated or incorrect responses.
What Came First, Math or Computing?
One of the most fundamental conundrums in the philosophy of mathematics is the question of whether mathematics was discovered by humans or invented by them. On one hand, it seems hard to argue that highly sophisticated mathematical objects, such as inaccessible cardinals, were discovered. On the other hand, as Albert Einstein asked, "How can it be that mathematics, being after all a product of human thought, which is independent of experience, is so admirably appropriate to the objects of reality?" The 19th century mathematician Leopold Kronecker offered a compromise, saying "God created the integers, all else is the work of man." So let us consider the natural numbers.
OntoMath${}^{\mathbf{PRO}}$ 2.0 Ontology: Updates of the Formal Model
Kirillovich, Alexander, Nevzorova, Olga, Lipachev, Evgeny
This paper is devoted to the problems of ontology-based mathematical knowledge management and representation. The main attention is paid to the development of a formal model for the representation of mathematical statements in the Open Linked Data cloud. The proposed model is intended for applications that extract mathematical facts from natural language mathematical texts and represent these facts as Linked Open Data. The model is used in development of a new version of the OntoMath${}^{\mathrm{PRO}}$ ontology of professional mathematics is described. OntoMath${}^{\mathrm{PRO}}$ underlies a semantic publishing platform, that takes as an input a collection of mathematical papers in LaTeX format and builds their ontology-based Linked Open Data representation. The semantic publishing platform, in turn, is a central component of OntoMath digital ecosystem, an ecosystem of ontologies, text analytics tools, and applications for mathematical knowledge management, including semantic search for mathematical formulas and a recommender system for mathematical papers. According to the new model, the ontology is organized into three layers: a foundational ontology layer, a domain ontology layer and a linguistic layer. The domain ontology layer contains language-independent math concepts. The linguistic layer provides linguistic grounding for these concepts, and the foundation ontology layer provides them with meta-ontological annotations. The concepts are organized in two main hierarchies: the hierarchy of objects and the hierarchy of reified relationships.
Kaliszyk
Large formal mathematical knowledge bases encode considerable parts of advanced mathematics and exact science, allowing deep semantic computer assistance and verification of complicated theories down to the atomic logical rules. An essential part of automated reasoning over such large theories are methods learning selection of relevant knowledge from the thousands of proofs in the corpora. Such methods in turn rely on efficiently computable features characterizing the highly structured and inter-related mathematical statements. In this work we (i) propose novel semantic features characterizing the statements in such large semantic knowledge bases, (ii) propose and carry out their efficient implementation using deductive-AI data-structures such as substitution trees and discrimination nets, and (iii) show that they significantly improve the strength of existing knowledge selection methods and automated reasoning methods over the large formal knowledge bases. In particular, on a standard large-theory benchmark we improve the average predicted rank of a mathematical statement needed for a proof by 22% in comparison with state of the art. This allows us to prove 8% more theorems in comparison with state of the art.
Goedel's Incompleteness Theorem and the Emergence of AI
In 1931 at the age of just 25 years, the young Austrian mathematician Kurt Goedel proved an astonishing mathematical theorem that made him instantly famous and a celebrity in mathematical circles around the world (see picture; in the Anglo-Saxon literature Goedel is usually referred to as "Godel" skipping the Umlaut in his German name). Despite its very abstract nature and the lack of any every day practical use, Goedel's theorem - the so called Incompleteness Theorem - has had a dramatic and deep impact on mathematics itself and its foundations. It also had a substantial impact on the philosophy of the 20th century and our understanding of the general limitations of computers and algorithms. I will explain here how Goedel's theorem actually caused the emergence of the new science of Artificial Intelligence (AI) and theoretical computer science in the 1940s and 1950s and how it motivated such key AI pioneers like Alan Turing to get involved. The birth of AI and the course AI has taken ...