Goto

Collaborating Authors

 Logic & Formal Reasoning


AI for software engineering: from probable to provable

arXiv.org Artificial Intelligence

Vibe coding, the much-touted use of AI techniques for programming, faces two overwhelming obstacles: the difficulty of specifying goals ("prompt engineering" is a form of requirements engineering, one of the toughest disciplines of software engineering); and the hallucination phenomenon. Programs are only useful if they are correct or very close to correct. The solution? Combine the creativity of artificial intelligence with the rigor of formal specification methods and the power of formal program verification, supported by modern proof tools.


Impure Simplicial Complex and Term-Modal Logic with Assignment Operators

arXiv.org Artificial Intelligence

Impure simplicial complexes are a powerful tool to model multi-agent epistemic situations where agents may die, but it is difficult to define a satisfactory semantics for the ordinary propositional modal language on such models, since many conceptually dubious expressions involving dead agents can be expressed in this language. In this paper, we introduce a term-modal language with assignment operators, in which such conceptually dubious expressions are syntactically excluded. We define both simplicial semantics and first-order Kripke semantics for this language, characterize their respective expressivity through notions of bisimulation, and show that the two semantics are equivalent when we consider a special class of first order Kripke models called local epistemic models. We also offer a complete axiomatization for the epistemic logic based on this language, and show that our language has a notion of assignment normal form. Finally, we discuss the behavior of a kind of intensional distributed knowledge that can be naturally expressed in our language.


Comparing State-Representations for DEL Model Checking

arXiv.org Artificial Intelligence

Model checking with the standard Kripke models used in (Dynamic) Epistemic Logic leads to scalability issues. Hence alternative representations have been developed, in particular symbolic structures based on Binary Decision Diagrams (BDDs) and succinct models based on mental programs. While symbolic structures have been shown to perform well in practice, their theoretical complexity was not known so far. On the other hand, for succinct models model checking is known to be PSPACE-complete, but no implementations are available. We close this gap and directly relate the two representations. We show that model checking DEL on symbolic structures encoded with BDDs is also PSPACE-complete. In fact, already model checking Epistemic Logic without dynamics is PSPACE-complete on symbolic structures. We also provide direct translations between BDDs and mental programs. Both translations yield exponential outputs. For the translation from mental programs to BDDs we show that no small translation exists. For the other direction we conjecture the same.


Graded Distributed Belief

arXiv.org Artificial Intelligence

The idea of using belief bases as formal semantics for multi-agent epistemic logic was first introduced in [26] and further developed in [27, 28]. This approach aligns with the sentential (or syntactic) perspective on knowledge representation [21, 13, 33, 20], which holds th at an agent's body of knowledge should be represented as a set of sentences in a formal language. The key novelty of belief base semantics, compared to traditional epistemic logic semantics based on multi-relational Kripke models [31, 12], lies in two main aspects. First, a possible world (or state) in a mo del is not treated as a primitive entity but is instead composed of the agents' belief bases and a valu ation of propositional atoms. Second, the agents' accessibility relations are not explicitly par t of the model but are determined a posteriori from their belief bases.


Synthesizing Visual Concepts as Vision-Language Programs

arXiv.org Artificial Intelligence

Vision-Language models (VLMs) achieve strong performance on multimodal tasks but often fail at systematic visual reasoning tasks, leading to inconsistent or illogical outputs. Neuro-symbolic methods promise to address this by inducing interpretable logical rules, though they exploit rigid, domain-specific perception modules. We propose Vision-Language Programs (VLP), which combine the perceptual flexibility of VLMs with systematic reasoning of program synthesis. Rather than embedding reasoning inside the VLM, VLP leverages the model to produce structured visual descriptions that are compiled into neuro-symbolic programs. The resulting programs execute directly on images, remain consistent with task constraints, and provide human-interpretable explanations that enable easy shortcut mitigation. Experiments on synthetic and real-world datasets demonstrate that VLPs outperform direct and structured prompting, particularly on tasks requiring complex logical reasoning.


ObjectAlign: Neuro-Symbolic Object Consistency Verification and Correction

arXiv.org Artificial Intelligence

Video editing and synthesis often introduce object inconsistencies, such as frame flicker and identity drift that degrade perceptual quality. To address these issues, we introduce ObjectAlign, a novel framework that seamlessly blends perceptual metrics with symbolic reasoning to detect, verify, and correct object-level and temporal inconsistencies in edited video sequences. The novel contributions of ObjectAlign are as follows: First, we propose learnable thresholds for metrics characterizing object consistency (i.e. CLIP-based semantic similarity, LPIPS perceptual distance, histogram correlation, and SAM-derived object-mask IoU). Second, we introduce a neuro-symbolic verifier that combines two components: (a) a formal, SMT-based check that operates on masked object embeddings to provably guarantee that object identity does not drift, and (b) a temporal fidelity check that uses a probabilistic model checker to verify the video's formal representation against a temporal logic specification. A frame transition is subsequently deemed "consistent" based on a single logical assertion that requires satisfying both the learned metric thresholds and this unified neuro-symbolic constraint, ensuring both low-level stability and high-level temporal correctness. Finally, for each contiguous block of flagged frames, we propose a neural network based interpolation for adaptive frame repair, dynamically choosing the interpolation depth based on the number of frames to be corrected. This enables reconstruction of the corrupted frames from the last valid and next valid keyframes. Our results show up to 1.4 point improvement in CLIP Score and up to 6.1 point improvement in warp error compared to SOTA baselines on the DAVIS and Pexels video datasets.


Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification

arXiv.org Artificial Intelligence

We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.


A Rule-Based Approach to Specifying Preferences over Conflicting Facts and Querying Inconsistent Knowledge Bases

arXiv.org Artificial Intelligence

Repair-based semantics have been extensively studied as a means of obtaining meaningful answers to queries posed over inconsistent knowledge bases (KBs). While several works have considered how to exploit a priority relation between facts to select optimal repairs, the question of how to specify such preferences remains largely unaddressed. This motivates us to introduce a declarative rule-based framework for specifying and computing a priority relation between conflicting facts. As the expressed preferences may contain undesirable cycles, we consider the problem of determining when a set of preference rules always yields an acyclic relation, and we also explore a pragmatic approach that extracts an acyclic relation by applying various cycle removal techniques. Towards an end-to-end system for querying inconsistent KBs, we present a preliminary implementation and experimental evaluation of the framework, which employs answer set programming to evaluate the preference rules, apply the desired cycle resolution techniques to obtain a priority relation, and answer queries under prioritized-repair semantics.


REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

arXiv.org Artificial Intelligence

Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we developed HERALD-AF, a data extraction pipeline that converts natural language math problems into formal statements, and a new open-source Lean 4 interactive environment (Jixia-interactive) to facilitate synthesis data collection. In our experiments, our prover using only supervised fine-tune achieves competitive results with a 23.7% success rate (Pass@64) on the ProofNet dataset-comparable to state-of-the-art (SOTA) models. To further evaluate our approach, we introduce FATE-M, a new benchmark focused on algebraic problems, where our prover achieves a SOTA success rate of 56.7% (Pass@64).


Ternary Gamma Semirings as a Novel Algebraic Framework for Learnable Symbolic Reasoning

arXiv.org Artificial Intelligence

Binary semirings such as the tropical, log, and probability semirings form a core algebraic tool in classical and modern neural inference systems, supporting tasks like Viterbi decoding, dynamic programming, and probabilistic reasoning. However, these structures rely on a binary multiplication operator and therefore model only pairwise interactions. Many symbolic AI tasks are inherently triadic, including subject-predicate-object relations in knowledge graphs, logical rules involving two premises and one conclusion, and multi-entity dependencies in structured decision processes. Existing neural architectures usually approximate these interactions by flattening or factorizing them into binary components, which weakens inductive structure, distorts relational meaning, and reduces interpretability. This paper introduces the Neural Ternary Semiring (NTS), a learnable and differentiable algebraic framework grounded in the theory of ternary Gamma-semirings. The central idea is to replace the usual binary product with a native ternary operator implemented by neural networks and guided by algebraic regularizers enforcing approximate associativity and distributivity. This construction allows triadic relationships to be represented directly rather than reconstructed from binary interactions. We establish a soundness result showing that, when algebraic violations vanish during training, the learned operator converges to a valid ternary Gamma-semiring. We also outline an evaluation strategy for triadic reasoning tasks such as knowledge-graph completion and rule-based inference. These insights demonstrate that ternary Gamma-semirings provide a mathematically principled and practically effective foundation for learnable symbolic reasoning.