Logic & Formal Reasoning
Generating Global and Local Explanations for Tree-Ensemble Learning Methods by Answer Set Programming
Takemura, Akihiro, Inoue, Katsumi
We propose a method for generating rule sets as global and local explanations for tree-ensemble learning methods using Answer Set Programming (ASP). To this end, we adopt a decompositional approach where the split structures of the base decision trees are exploited in the construction of rules, which in turn are assessed using pattern mining methods encoded in ASP to extract explanatory rules. For global explanations, candidate rules are chosen from the entire trained tree-ensemble models, whereas for local explanations, candidate rules are selected by only considering rules that are relevant to the particular predicted instance. We show how user-defined constraints and preferences can be represented declaratively in ASP to allow for transparent and flexible rule set generation, and how rules can be used as explanations to help the user better understand the models. Experimental evaluation with real-world datasets and popular tree-ensemble algorithms demonstrates that our approach is applicable to a wide range of classification tasks.
3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes
Lamont, Sean, Walder, Christian, Dezfouli, Amir, Montague, Paul, Norrish, Michael
A key challenge in automated formal reasoning is the intractable search space, which grows exponentially with the depth of the proof. This branching is caused by the large number of candidate proof tactics which can be applied to a given goal. Nonetheless, many of these tactics are semantically similar or lead to an execution error, wasting valuable resources in both cases. We address the problem of effectively pruning this search, using only synthetic data generated from previous proof attempts. We first demonstrate that it is possible to generate semantically aware tactic representations which capture the effect on the proving environment, likelihood of success and execution time. We then propose a novel filtering mechanism which leverages these representations to select semantically diverse and high quality tactics, using Determinantal Point Processes. Our approach, 3D-Prover, is designed to be general, and to augment any underlying tactic generator. We demonstrate the effectiveness of 3D-Prover on the miniF2F-valid and miniF2F-test benchmarks by augmenting the ReProver LLM. We show that our approach leads to an increase in the overall proof rate, as well as a significant improvement in the tactic success rate, execution time and diversity.
FormalAlign: Automated Alignment Evaluation for Autoformalization
Lu, Jianqiao, Wan, Yingjia, Huang, Yinya, Xiong, Jing, Liu, Zhengying, Guo, Zhijiang
Autoformalization aims to convert informal mathematical proofs into machine-verifiable formats, bridging the gap between natural and formal languages. However, ensuring semantic alignment between the informal and formalized statements remains challenging. Existing approaches heavily rely on manual verification, hindering scalability. To address this, we introduce \textsc{FormalAlign}, the first automated framework designed for evaluating the alignment between natural and formal languages in autoformalization. \textsc{FormalAlign} trains on both the autoformalization sequence generation task and the representational alignment between input and output, employing a dual loss that combines a pair of mutually enhancing autoformalization and alignment tasks. Evaluated across four benchmarks augmented by our proposed misalignment strategies, \textsc{FormalAlign} demonstrates superior performance. In our experiments, \textsc{FormalAlign} outperforms GPT-4, achieving an Alignment-Selection Score 11.58\% higher on \forml-Basic (99.21\% vs. 88.91\%) and 3.19\% higher on MiniF2F-Valid (66.39\% vs. 64.34\%). This effective alignment evaluation significantly reduces the need for manual verification. Both the dataset and code can be accessed via~\url{https://github.com/rookie-joe/FormalAlign}.
Learning Interpretable Classifiers for PDDL Planning
We consider the problem of synthesizing interpretable models that recognize the behaviour of an agent compared to other agents, on a whole set of similar planning tasks expressed in PDDL. Our approach consists in learning logical formulas, from a small set of examples that show how an agent solved small planning instances. These formulas are expressed in a version of First-Order Temporal Logic (FTL) tailored to our planning formalism. Such formulas are human-readable, serve as (partial) descriptions of an agent's policy, and generalize to unseen instances. We show that learning such formulas is computationally intractable, as it is an NP-hard problem. As such, we propose to learn these behaviour classifiers through a topology-guided compilation to MaxSAT, which allows us to generate a wide range of different formulas. Experiments show that interesting and accurate formulas can be learned in reasonable time.
Learning to Prove Theorems by Learning to Generate Theorems
We consider the task of automated theorem proving, a key AI task. Deep learning has shown promise for training theorem provers, but there are limited human-written theorems and proofs available for supervised learning. To address this limitation, we propose to learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover. Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover and advances the state of the art of automated theorem proving in Metamath.
Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis
The use of deep learning techniques has achieved significant progress for program synthesis from input-output examples. However, when the program semantics become more complex, it still remains a challenge to synthesize programs that are consistent with the specification. In this work, we propose SED, a neural program generation framework that incorporates synthesis, execution, and debugging stages. Instead of purely relying on the neural program synthesizer to generate the final program, SED first produces initial programs using the neural program synthesizer component, then utilizes a neural program debugger to iteratively repair the generated programs. The integration of the debugger component enables SED to modify the programs based on the execution results and specification, which resembles the coding process of human programmers. On Karel, a challenging input-output program synthesis benchmark, SED reduces the error rate of the neural program synthesizer itself by a considerable margin, and outperforms the standard beam search for decoding.
What killed the cat? Towards a logical formalization of curiosity (and suspense, and surprise) in narratives
de Saint-Cyr, Florence Dupin, Bosser, Anne-Gwenn, Callac, Benjamin, Maisel, Eric
Humans tell stories to make sense of the world and communicate their understanding of what happens. Storytelling supposes to be able to sort out which events are worth telling, deciding on a level of detail for describing events, selecting among possible causes the ones which are deemed worth telling. It also supposes to make use of an affective machinery for capturing an audience's attention (emotional contagion, suspense elicitation...). In the act of storytelling, structural and affective phenomena are thus combined with communicative goals in mind. This combination has indeed shown its effectiveness in this respect: the phenomenon of narrative transportation (the experience of being immersed in a story) has been linked to persuasion [27]. The narrative paradigm therefore provides an appropriate framework, in which causal reasoning about the situations narrated [53] is combined with narrative devices to encourage the audience's emotional involvement [51], to study and model how opinion is formed and evolves. Building a framework for reasoning about and unveiling storytelling mechanics could pave the way for intellectual selfdefense supporting tools, enabling citizens to arm themselves against hostile disinformation or influence campaigns. Previous works in structural narratology have studied the way stories are conveyed to their audience and seminal work from (for instance) Genette [25] or Propp [45] have previously served as the backbone inspiration for computational narrative models and storytelling systems [43].
Experiments with Choice in Dependently-Typed Higher-Order Logic
Ranalter, Daniel, Brown, Chad E., Kaliszyk, Cezary
Recently an extension to higher-order logic -- called DHOL -- was introduced, enriching the language with dependent types, and creating a powerful extensional type theory. In this paper we propose two ways how choice can be added to DHOL. We extend the DHOL term structure by Hilbert's indefinite choice operator $\epsilon$, define a translation of the choice terms to HOL choice that extends the existing translation from DHOL to HOL and show that the extension of the translation is complete and give an argument for soundness. We finally evaluate the extended translation on a set of dependent HOL problems that require choice.
Program Synthesis with Pragmatic Communication
Program synthesis techniques construct or infer programs from user-provided specifications, such as input-output examples. Yet most specifications, especially those given by end-users, leave the synthesis problem radically ill-posed, because many programs may simultaneously satisfy the specification. This work introduces a new inductive bias derived by modeling the program synthesis task as rational communication, drawing insights from recursive reasoning models of pragmatics. Given a specification, we score a candidate program both on its consistency with the specification, and also whether a rational speaker would chose this particular specification to communicate that program. We develop efficient algorithms for such an approach when learning from input-output examples, and build a pragmatic program synthesizer over a simple grid-like layout domain. A user study finds that end-user participants communicate more effectively with the pragmatic program synthesizer over a non-pragmatic one.
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to language models. Thor increases a language model's success rate on the PISA dataset from 39\% to 57\%, while solving 8.2\% of problems neither language models nor automated theorem provers are able to solve on their own.