proof assistant
MLFMF: Data Sets for Machine Learning for Mathematical Formalization
We introduce MLFMF, a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes the largest Lean 4 library Mathlib, and some of the largest Agda libraries: the standard library, the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of s-expressions representing the syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the s-expressions give complete and easily parsed information about every entry.We report baseline results using standard graph and word embeddings, tree ensembles, and instance-based learning algorithms. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. The methodology used to extract the networks and the s-expressions readily applies to other libraries, and is applicable to other proof assistants. With more than $250\,000$ entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format.
A Case Study on the Effectiveness of LLMs in Verification with Proof Assistants
Bayazฤฑt, Barฤฑล, Li, Yao, Si, Xujie
Large language models (LLMs) can potentially help with verification using proof assistants by automating proofs. However, it is unclear how effective LLMs are in this task. In this paper, we perform a case study based on two mature Rocq projects: the hs-to-coq tool and Verdi. We evaluate the effectiveness of LLMs in generating proofs by both quantitative and qualitative analysis. Our study finds that: (1) external dependencies and context in the same source file can significantly help proof generation; (2) LLMs perform great on small proofs but can also generate large proofs; (3) LLMs perform differently on different verification projects; and (4) LLMs can generate concise and smart proofs, apply classical techniques to new definitions, but can also make odd mistakes.
Thinking Machines: Mathematical Reasoning in the Age of LLMs
Asperti, Andrea, Naibo, Alberto, Coen, Claudio Sacerdoti
Large Language Models (LLMs) have shown remarkable abilities in structured reasoning and symbolic tasks, with coding emerging as a particular area of strength. This success has sparked growing interest in applying LLMs to mathematics, both in informal problem-solving and formal theorem proving. However, progress in formal mathematics has proven to be significantly more difficult, despite surface-level similarities between programming and proof construction. This discrepancy raises important questions about how LLMs ``reason'', how they are supervised, and whether they internally track a notion of computational or deductive state. In this article, we address the state-of-the-art of the discipline, focusing on recent models and benchmarks, and explore three central issues at the intersection of machine learning and mathematical cognition: (i) the trade-offs between formal and informal mathematics as training domains; (ii) the deeper reasons why proof generation remains more brittle than code synthesis; (iii) and the question of whether LLMs represent, or merely mimic, a notion of evolving logical state. Our goal is not to draw hard boundaries, but to identify where the current limits lie, and how they might be extended.
Transfinite Fixed Points in Alpay Algebra as Ordinal Game Equilibria in Dependent Type Theory
Alpay, Faruk, Kilictas, Bugra, Alpay, Taylan
This paper contributes to the Alpay Algebra by demonstrating that the stable outcome of a self referential process, obtained by iterating a transformation through all ordinal stages, is identical to the unique equilibrium of an unbounded revision dialogue between a system and its environment. The analysis initially elucidates how classical fixed point theorems guarantee such convergence in finite settings and subsequently extends the argument to the transfinite domain, relying upon well founded induction and principles of order theoretic continuity. Furthermore, the resulting transordinal fixed point operator is embedded into dependent type theory, a formalization which permits every step of the transfinite iteration and its limit to be verified within a modern proof assistant. This procedure yields a machine checked proof that the iterative dialogue necessarily stabilizes and that its limit is unique. The result provides a foundation for Alpay's philosophical claim of semantic convergence within the framework of constructive logic. By unifying concepts from fixed point theory, game semantics, ordinal analysis, and type theory, this research establishes a broadly accessible yet formally rigorous foundation for reasoning about infinite self referential systems and offers practical tools for certifying their convergence within computational environments.