Logic & Formal Reasoning
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
Tsoukalas, George, Lee, Jasper, Jennings, John, Xin, Jimmy, Ding, Michelle, Jennings, Michael, Thakur, Amitayush, Chaudhuri, Swarat
We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.
Lean-STaR: Learning to Interleave Thinking and Proving
Lin, Haohan, Sun, Zhiqing, Yang, Yiming, Welleck, Sean
Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal proofs can be useful for learning to prove theorems. For instance, humans think through steps of a proof, but this thought process is not visible in the resulting code. We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof, thereby boosting the model's theorem-proving capabilities. Lean-STaR uses retrospective ground-truth tactics to generate synthetic thoughts for training the language model. At inference time, the trained model directly generates the thoughts prior to the prediction of the tactics in each proof step. Building on the self-taught reasoner framework, we then apply expert iteration to further fine-tune the model on the correct proofs it samples and verifies using the Lean solver. Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment, significantly outperforming base models (43.4% 46.3%, Pass@64). We also analyze the impact of the augmented thoughts on various aspects of the theorem proving process, providing insights into their effectiveness.
Static Analysis of Logic Programs via Boolean Networks
Trinh, Van-Giang, Benhamou, Belaid
Answer Set Programming (ASP) is a declarative problem solving paradigm that can be used to encode a combinatorial problem as a logic program whose stable models correspond to the solutions of the considered problem. ASP has been widely applied to various domains in AI and beyond. The question "What can be said about stable models of a logic program from its static information?" has been investigated and proved useful in many circumstances. In this work, we dive into this direction more deeply by making the connection between a logic program and a Boolean network, which is a prominent modeling framework with applications to various areas. The proposed connection can bring the existing results in the rich history on static analysis of Boolean networks to explore and prove more theoretical results on ASP, making it become a unified and powerful tool to further study the static analysis of ASP. In particular, the newly obtained insights have the potential to benefit many problems in the field of ASP.
Logical Characterizations of Recurrent Graph Neural Networks with Reals and Floats
Ahvonen, Veeti, Heiman, Damian, Kuusisto, Antti, Lutz, Carsten
In pioneering work from 2019, Barcel\'o and coauthors identified logics that precisely match the expressive power of constant iteration-depth graph neural networks (GNNs) relative to properties definable in first-order logic. In this article, we give exact logical characterizations of recurrent GNNs in two scenarios: (1) in the setting with floating-point numbers and (2) with reals. For floats, the formalism matching recurrent GNNs is a rule-based modal logic with counting, while for reals we use a suitable infinitary modal logic, also with counting. These results give exact matches between logics and GNNs in the recurrent setting without relativising to a background logic in either case, but using some natural assumptions about floating-point arithmetic. Applying our characterizations, we also prove that, relative to graph properties definable in monadic second-order logic (MSO), our infinitary and rule-based logics are equally expressive. This implies that recurrent GNNs with reals and floats have the same expressive power over MSO-definable properties and shows that, for such properties, also recurrent GNNs with reals are characterized by a (finitary!) rule-based modal logic. In the general case, in contrast, the expressive power with floats is weaker than with reals. In addition to logic-oriented results, we also characterize recurrent GNNs, with both reals and floats, via distributed automata, drawing links to distributed computing models.
Predictable and Performant Reactive Synthesis Modulo Theories via Functional Synthesis
Rodríguez, Andoni, Gorostiaga, Felipe, Sánchez, César
Reactive synthesis is the process of generating correct controllers from temporal logic specifications. Classical LTL reactive synthesis handles (propositional) LTL as a specification language. Boolean abstractions allow reducing LTLt specifications (i.e., LTL with propositions replaced by literals from a theory calT), into equi-realizable LTL specifications. In this paper we extend these results into a full static synthesis procedure. The synthesized system receives from the environment valuations of variables from a rich theory calT and outputs valuations of system variables from calT. We use the abstraction method to synthesize a reactive Boolean controller from the LTL specification, and we combine it with functional synthesis to obtain a static controller for the original LTLt specification. We also show that our method allows responses in the sense that the controller can optimize its outputs in order to e.g., always provide the smallest safe values. This is the first full static synthesis method for LTLt, which is a deterministic program (hence predictable and efficient).
Bi-reachability in Petri nets with data
Kamiński, Łukasz, Lasota, Sławomir
We investigate Petri nets with data, an extension of plain Petri nets where tokens carry values from an infinite data domain, and executability of transitions is conditioned by equalities between data values. We provide a decision procedure for the bi-reachability problem: given a Petri net and its two configurations, we ask if each of the configurations is reachable from the other. This pushes forward the decidability borderline, as the bi-reachability problem subsumes the coverability problem (which is known to be decidable) and is subsumed by the reachability problem (whose decidability status is unknown).
Automata-based constraints for language model decoding
Koo, Terry, Liu, Frederick, He, Luheng
Language models (LMs) are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee conformance, especially with smaller LMs suitable for large-scale deployment. In addition, tuning requires significant resources, making it impractical for uncommon or task-specific formats. To prevent downstream parsing errors we would ideally constrain the LM to only produce valid output, but this is severely complicated by tokenization, which is typically both ambiguous and misaligned with the formal grammar. We solve these issues through the application of automata theory, deriving an efficient closed-form solution for the regular languages, a broad class of formal languages with many practical applications, including API calls or schema-guided JSON and YAML. We also discuss pragmatic extensions for coping with the issue of high branching factor. Finally, we extend our techniques to deterministic context-free languages, which similarly admit an efficient closed-form solution. In spite of its flexibility and representative power, our approach only requires access to per-token decoding logits and lowers into simple calculations that are independent of LM size, making it both efficient and easy to apply to almost any LM architecture.
AbstractBeam: Enhancing Bottom-Up Program Synthesis using Library Learning
Zenkner, Janis, Dierkes, Lukas, Sesterhenn, Tobias, Bartelt, Chrisitan
LambdaBeam is a state-of-the-art execution-guided algorithm for program synthesis that incorporates higher-order functions, lambda functions, and iterative loops into the Domain-Specific Language (DSL). LambdaBeam generates every program from the start. Yet, many program blocks or subprograms occur frequently in a given domain, e.g., loops to traverse a list. Thus, repeating programs can be used to enhance the synthesis algorithm. However, LambdaBeam fails to leverage this potential. For this purpose, we introduce AbstractBeam: A novel program synthesis framework that employs Library Learning to identify such program repetitions, integrates them into the DSL, and thus utilizes their potential to boost LambdaBeam's synthesis algorithm. Our experimental evaluations demonstrate that AbstractBeam significantly improves LambdaBeam's performance in the LambdaBeam integer list manipulation domain. Additionally, AbstractBeam's program generation is more efficient compared to LambdaBeam's synthesis. Finally, our findings indicate that Library Learning is effective in domains not specifically crafted to highlight its benefits.
Historical Review of Variants of Informal Semantics for Logic Programs under Answer Set Semantics: GL'88, GL'91, GK'14, D-V'12
This note presents a historical survey of informal semantics that are associated with logic programming under answer set semantics. We review these in uniform terms and align them with two paradigms: Answer Set Programming and ASP-Prolog -- two prominent Knowledge Representation and Reasoning Paradigms in Artificial Intelligence. Under consideration in Theory and Practice of Logic Programming (TPLP).
Regularization in Spider-Style Strategy Discovery and Schedule Construction
Bártek, Filip, Chvalovský, Karel, Suda, Martin
To achieve the best performance, automatic theorem provers often rely on schedules of diverse proving strategies to be tried out (either sequentially or in parallel) on a given problem. In this paper, we report on a large-scale experiment with discovering strategies for the Vampire prover, targeting the FOF fragment of the TPTP library and constructing a schedule for it, based on the ideas of Andrei Voronkov's system Spider. We examine the process from various angles, discuss the difficulty (or ease) of obtaining a strong Vampire schedule for the CASC competition, and establish how well a schedule can be expected to generalize to unseen problems and what factors influence this property.