Logic & Formal Reasoning
Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis
Liu, Haoxiong, Sun, Jiacheng, Li, Zhenguo, Yao, Andrew C
The synergy between deep learning models and traditional automation tools plays a pivotal role in developing robust neural theorem provers (NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when the model explicitly calls the method, or only at a single granularity level, failing to fully exploit the power of built-in tactics and off-the-shelf automated theorem provers. In this work, we propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency through equipping proof-generation LLMs with automation methods in different granularities via fine-grained structure analysis of model-generated proof proposals. Furthermore, ProofAug serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version), setting a new SOTA across all proof languages with a total sample budget of only 2100. Our code is available at https://github.com/haoxiongliu/ProofAug.
SemML: Enhancing Automata-Theoretic LTL Synthesis with Machine Learning
Kretinsky, Jan, Meggendorfer, Tobias, Prokop, Maximilian, Zarkhah, Ashkan
Synthesizing a reactive system from specifications given in linear temporal logic (LTL) is a classical problem, finding its applications in safety-critical systems design. We present our tool SemML, which won this year's LTL realizability tracks of SYNTCOMP, after years of domination by Strix. While both tools are based on the automata-theoretic approach, ours relies heavily on (i) Semantic labelling, additional information of logical nature, coming from recent LTL-to-automata translations and decorating the resulting parity game, and (ii) Machine-Learning approaches turning this information into a guidance oracle for on-the-fly exploration of the parity game (whence the name SemML). Our tool fills the missing gaps of previous suggestions to use such an oracle and provides an efficeint implementation with additional algorithmic improvements. We evaluate SemML both on the entire set of SYNTCOMP as well as a synthetic data set, compare it to Strix, and analyze the advantages and limitations. As SemML solves more instances on SYNTCOMP and does so significantly faster on larger instances, this demonstrates for the first time that machine-learning-aided approaches can out-perform state-of-the-art tools in real LTL synthesis.
Certifying Pareto-Optimality in Multi-Objective Maximum Satisfiability
Jabs, Christoph, Berg, Jeremias, Bogaerts, Bart, Järvisalo, Matti
Due to the wide employment of automated reasoning in the analysis and construction of correct systems, the results reported by automated reasoning engines must be trustworthy. For Boolean satisfiability (SAT) solvers - and more recently SAT-based maximum satisfiability (MaxSAT) solvers - trustworthiness is obtained by integrating proof logging into solvers, making solvers capable of emitting machine-verifiable proofs to certify correctness of the reasoning steps performed. In this work, we enable for the first time proof logging based on the VeriPB proof format for multi-objective MaxSAT (MO-MaxSAT) optimization techniques. Although VeriPB does not offer direct support for multi-objective problems, we detail how preorders in VeriPB can be used to provide certificates for MO-MaxSAT algorithms computing a representative solution for each element in the non-dominated set of the search space under Pareto-optimality, without extending the VeriPB format or the proof checker. By implementing VeriPB proof logging into a state-of-the-art multi-objective MaxSAT solver, we show empirically that proof logging can be made scalable for MO-MaxSAT with reasonable overhead.
Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers
Raza, Mohammad, Milic-Frayling, Natasa
Robustness of reasoning remains a significant challenge for large language models, and addressing it is essential for the practical applicability of AI-driven reasoning systems. We introduce Semantic Self-Verification (SSV), a novel approach that addresses the key challenge in combining language models with the rigor of logical solvers: to accurately formulate the reasoning problem from natural language to the formal language of the solver. SSV uses a consistency-based approach to produce strong abstract formalizations of problems using concrete instantiations that are generated by the model and verified by the solver. In addition to significantly advancing the overall reasoning accuracy over the state-of-the-art, a key novelty that this approach presents is a feature of verification that has near-perfect precision over a significant coverage of cases, as we demonstrate on open reasoning benchmarks. We propose such *near-certain reasoning* as a new approach to reduce the need for manual verification in many cases, taking us closer to more dependable and autonomous AI reasoning systems.
What is Formal Verification without Specifications? A Survey on mining LTL Specifications
Virtually all verification techniques using formal methods rely on the availability of a formal specification, which describes the design requirements precisely. However, formulating specifications remains a manual task that is notoriously challenging and error-prone. To address this bottleneck in formal verification, recent research has thus focussed on automatically generating specifications for formal verification from examples of (desired and undesired) system behavior. In this survey, we list and compare recent advances in mining specifications in Linear Temporal Logic (LTL), the de facto standard specification language for reactive systems. Several approaches have been designed for learning LTL formulas, which address different aspects and settings of specification design. Moreover, the approaches rely on a diverse range of techniques such as constraint solving, neural network training, enumerative search, etc. We survey the current state-of-the-art techniques and compare them for the convenience of the formal methods practitioners.
Review for NeurIPS paper: Program Synthesis with Pragmatic Communication
Summary and Contributions: This paper frames interactive program synthesis as a reference game played between a demonstrator and a synthesizer. The setting is constructing patterns on a 2D grid, where the demonstration iteratively constructs a pattern by placing symbols, and the synthesizer infers a program to complete the output based on the symbols produced so far. The paper applies recursive pragmatic models from the rational speech acts (RSA) framework, deriving a pragmatic synthesizer that models the demonstrator's intention in choosing symbols. This is done by alternating renormalization over demonstrations and over programs, using full enumeration of the set of possible programs and memoization of probabilities. The paper compares pragmatic and non-pragmatic synthesizers, with humans playing the role of the demonstrators. Pragmatic synthesizers have significantly more efficient interactions: the human needs to place fewer symbols on average in order to correctly get the synthesizer to infer the pattern the person was attempting to demonstrate.
Review for NeurIPS paper: Program Synthesis with Pragmatic Communication
This paper studies the problem of programming by example via the lens of rational communication: how can we synthesize programs assuming humans are providing examples in a rational communcation framework? There are some significant weaknesses in the computational aspects of the paper, where it depends on explicit enumeration that limits its scalability. Having said that, reviewers (and AC) are in agreement that this is an interesting new idea that is worth publishing. I agree with R4's updated assessment that "Upon reflection, I think that encouraging work that take into account the human factor in synthesis could be a positive for the NeurIPS community."
Review for NeurIPS paper: Learning Compositional Rules via Neural Program Synthesis
The paper claims that the model "learn[s] entire rule systems from a small set of examples". I'm not convinced that this is the case in this work and neither in the previous work which this one extends (i.e. Both methods heavily rely on the supporting set and the specific neural attention architecture of the encoder and decoder which allow for the replacement of individual tokens. This allows the model to exploit a certain pattern in the support set e.g. "a b c - a c a" by replacing the "a" and "b" on-the-fly and execute the abstract rule given by the supporting set.
Reviews: Write, Execute, Assess: Program Synthesis with a REPL
"Given a large enough time budget the'no REPL' baseline is competitive with our ablated alternatives." However, the policy rollout baseline is trained with RL using a single machine, making it difficult to explore using entropy based methods or epsilon greedy. However, using multiple actors in an asynchronous setting would be a stronger/fairer baseline (and then doing policy rollouts) to the SMC approach. I expect SMC to do well but this is an important empirical question (other methods cited like Ganin et al. seem to do this in the same context). "The value-guided SMC sampler leads to the highest overall number of correct programs, requiring less time and fewer nodes expanded compared to other inference techniques. " - how well does a SMC sampler work without value guided proposals for both case studies?