Goto

Collaborating Authors

 Logic & Formal Reasoning


Certified MaxSAT Preprocessing

arXiv.org Artificial Intelligence

Building on the progress in Boolean satisfiability (SAT) solving over the last decades, maximum satisfiability (MaxSAT) has become a viable approach for solving NP-hard optimization problems, but ensuring correctness of MaxSAT solvers has remained an important concern. For SAT, this is largely a solved problem thanks to the use of proof logging, meaning that solvers emit machine-verifiable proofs of (un)satisfiability to certify correctness. However, for MaxSAT, proof logging solvers have started being developed only very recently. Moreover, these nascent efforts have only targeted the core solving process, ignoring the preprocessing phase where input problem instances can be substantially reformulated before being passed on to the solver proper. In this work, we demonstrate how pseudo-Boolean proof logging can be used to certify the correctness of a wide range of modern MaxSAT preprocessing techniques. By combining and extending the VeriPB and CakePB tools, we provide formally verified, end-to-end proof checking that the input and preprocessed output MaxSAT problem instances have the same optimal value. An extensive evaluation on applied MaxSAT benchmarks shows that our approach is feasible in practice.


The Trembling-Hand Problem for LTLf Planning

arXiv.org Artificial Intelligence

Consider an agent acting to achieve its temporal goal, but with a "trembling hand". In this case, the agent may mistakenly instruct, with a certain (typically small) probability, actions that are not intended due to faults or imprecision in its action selection mechanism, thereby leading to possible goal failure. We study the trembling-hand problem in the context of reasoning about actions and planning for temporally extended goals expressed in Linear Temporal Logic on finite traces (LTLf), where we want to synthesize a strategy (aka plan) that maximizes the probability of satisfying the LTLf goal in spite of the trembling hand. We consider both deterministic and nondeterministic (adversarial) domains. We propose solution techniques for both cases by relying respectively on Markov Decision Processes and on Markov Decision Processes with Set-valued Transitions with LTLf objectives, where the set-valued probabilistic transitions capture both the nondeterminism from the environment and the possible action instruction errors from the agent. We formally show the correctness of our solution techniques and demonstrate their effectiveness experimentally through a proof-of-concept implementation.


The Framework of a Design Process Language

arXiv.org Artificial Intelligence

The thesis develops a view of design in a concept formation framework and outlines a language to describe both the object of the design and the process of designing. The unknown object at the outset of the design work may be seen as an unknown concept that the designer is to define. Throughout the process, she develops a description of this object by relating it to known concepts. The search stops when the designer is satisfied that the design specification is complete enough to satisfy the requirements from it once built. It is then a collection of propositions that all contribute towards defining the design object - a collection of sentences describing relationships between the object and known concepts. Also, the design process itself may be described by relating known concepts - by organizing known abilities into particular patterns of activation, or mobilization. In view of the demands posed to a language to use in this concept formation process, the framework of a Design Process Language (DPL) is developed. The basis for the language are linguistic categories that act as classes of relations used to combine concepts, containing relations used for describing process and object within the same general system, with some relations being process specific, others being object specific, and with the bulk being used both for process and object description. Another outcome is the distinction of modal relations, or relations describing futurity, possibility, willingness, hypothetical events, and the like. The design process almost always includes aspects such as these, and it is thus necessary for a language facilitating design process description to support such relationships to be constructed. The DPL is argued to be a foundation whereupon to build a language that can be used for enabling computers to be more useful - act more intelligently - in the design process.


P-NAL: an Effective and Interpretable Entity Alignment Method

arXiv.org Artificial Intelligence

Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings' constraint and learn to align the embeddings. The structural and side information are usually utilized via embedding propagation, aggregation or interaction. However, the details of the underlying logical inference steps among the alignment process are usually omitted, resulting in inadequate inference process. In this paper, we introduce P-NAL, an entity alignment method that captures two types of logical inference paths with Non-Axiomatic Logic (NAL). Type 1 is the bridge-like inference path between to-be-aligned entity pairs, consisting of two relation/attribute triples and a similarity sentence between the other two entities. Type 2 links the entity pair by their embeddings. P-NAL iteratively aligns entities and relations by integrating the conclusions of the inference paths. Moreover, our method is logically interpretable and extensible due to the expressiveness of NAL. Our proposed method is suitable for various EA settings. Experimental results show that our method outperforms state-of-the-art methods in terms of Hits@1, achieving 0.98+ on all three datasets of DBP15K with both supervised and unsupervised settings. To our knowledge, we present the first in-depth analysis of entity alignment's basic principles from a unified logical perspective.


NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection

arXiv.org Artificial Intelligence

Logical fallacies are common errors in reasoning that undermine the logic of an argument. Automatically detecting logical fallacies has important applications in tracking misinformation and validating claims. In this paper, we design a process to reliably detect logical fallacies by translating natural language to First-order Logic (FOL) step-by-step using Large Language Models (LLMs). We then utilize Satisfiability Modulo Theory (SMT) solvers to reason about the validity of the formula and classify inputs as either a fallacy or valid statement. Our model also provides a novel means of utilizing LLMs to interpret the output of the SMT solver, offering insights into the counter-examples that illustrate why a given sentence is considered a logical fallacy. Our approach is robust, interpretable and does not require training data or fine-tuning. We evaluate our model on a mixed dataset of fallacies and valid sentences. The results demonstrate improved performance compared to end-to-end LLMs, with our classifier achieving an F1-score of 71\% on the Logic dataset. The approach is able to generalize effectively, achieving an F1-score of 73% on the challenge set, LogicClimate, outperforming state-of-the-art models by 21% despite its much smaller size.


Monitoring Second-Order Hyperproperties

arXiv.org Artificial Intelligence

Hyperproperties express the relationship between multiple executions of a system. This is needed in many AI-related fields, such as knowledge representation and planning, to capture system properties related to knowledge, information flow, and privacy. In this paper, we study the monitoring of complex hyperproperties at runtime. Previous work in this area has either focused on the simpler problem of monitoring trace properties (which are sets of traces, while hyperproperties are sets of sets of traces) or on monitoring first-order hyperproperties, which are expressible in temporal logics with first-order quantification over traces, such as HyperLTL. We present the first monitoring algorithm for the much more expressive class of second-order hyperproperties. Second-order hyperproperties include system properties like common knowledge, which cannot be expressed in first-order logics like HyperLTL. We introduce Hyper$^2$LTL$_f$, a temporal logic over finite traces that allows for second-order quantification over sets of traces. We study the monitoring problem in two fundamental execution models: (1) the parallel model, where a fixed number of traces is monitored in parallel, and (2) the sequential model, where an unbounded number of traces is observed sequentially, one trace after the other. For the parallel model, we show that the monitoring of the second-order hyperproperties of Hyper$^2$LTL$_f$ can be reduced to monitoring first-order hyperproperties. For the sequential model, we present a monitoring algorithm that handles second-order quantification efficiently, exploiting optimizations based on the monotonicity of subformulas, graph-based storing of executions, and fixpoint hashing. We present experimental results from a range of benchmarks, including examples from common knowledge and planning.


Flow-Based Synthesis of Reactive Tests for Discrete Decision-Making Systems with Temporal Logic Specifications

arXiv.org Artificial Intelligence

Designing tests to evaluate if a given autonomous system satisfies complex specifications is challenging due to the complexity of these systems. This work proposes a flow-based approach for reactive test synthesis from temporal logic specifications, enabling the synthesis of test environments consisting of static and reactive obstacles and dynamic test agents. The temporal logic specifications describe desired test behavior, including system requirements as well as a test objective that is not revealed to the system. The synthesized test strategy places restrictions on system actions in reaction to the system state. The tests are minimally restrictive and accomplish the test objective while ensuring realizability of the system's objective without aiding it (semi-cooperative setting). Automata theory and flow networks are leveraged to formulate a mixed-integer linear program (MILP) to synthesize the test strategy. For a dynamic test agent, the agent strategy is synthesized for a GR(1) specification constructed from the solution of the MILP. If the specification is unrealizable by the dynamics of the test agent, a counterexample-guided approach is used to resolve the MILP until a strategy is found. This flow-based, reactive test synthesis is conducted offline and is agnostic to the system controller. Finally, the resulting test strategy is demonstrated in simulation and experimentally on a pair of quadrupedal robots for a variety of specifications.


Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

arXiv.org Artificial Intelligence

Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.


Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

arXiv.org Artificial Intelligence

Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.


Teaching Higher-Order Logic Using Isabelle

arXiv.org Artificial Intelligence

Higher-order logic, also known as simple type theory [3], has been described as the combination of functional programming and logic [9], and has proved a very powerful tool for the formalization of mathematics and computer science. It is an expressive enough logic to cover a wide array of fields, while still being built on relatively simple principles, and a number of proof assistants based on higher-order logic are available. We consider formal reasoning in the generic proof assistant Isabelle [10, 11]. In the present paper we are taking advantage of the genericity of Isabelle, but we also find that Isabelle is at least as user-friendly and intuitive as other proof assistants of comparable power. Although Isabelle is generic and comes with a number of object logics like first-order logic (FOL) and axiomatic set theory (ZF), the default object logic is higher-order logic, called Isabelle/HOL.