AITopics | Logic & Formal Reasoning

Collaborating Authors

Logic & Formal Reasoning

"I think the best hope for human-level AI is logical AI, based on the formalizing of commonsense knowledge and reasoning in mathematical logic. Formalizing common sense requires extensions to mathematical logic including nonmonotonic reasoning and extensive reification, e.g., of concepts and also contexts. The reifications require appropriate reflection schemas."
– from The Future of AI—A Manifesto by John McCarthy. AI Magazine 26(4), (2005).

News Overviews Instructional Materials AI-Alerts Classics

Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?

Jiang, Jin, Wang, Jianing, Yan, Yuchen, Liu, Yang, Zhu, Jianhua, Zhang, Mengdi, Cai, Xunliang, Gao, Liangcai

arXiv.org Artificial IntelligenceMay-23-2025

Large Language Models (LLMs) have been shown to achieve breakthrough performance on complex logical reasoning tasks. Nevertheless, most existing research focuses on employing formal language to guide LLMs to derive reliable reasoning paths, while systematic evaluations of these capabilities are still limited. In this paper, we aim to conduct a comprehensive evaluation of LLMs across various logical reasoning problems utilizing formal languages. From the perspective of three dimensions, i.e., spectrum of LLMs, taxonomy of tasks, and format of trajectories, our key findings are: 1) Thinking models significantly outperform Instruct models, especially when formal language is employed; 2) All LLMs exhibit limitations in inductive reasoning capability, irrespective of whether they use a formal language; 3) Data with PoT format achieves the best generalization performance across other languages. Additionally, we also curate the formal-relative training data to further enhance the small language models, and the experimental results indicate that a simple rejected fine-tuning method can better enable LLMs to generalize across formal languages and achieve the best overall performance. Our codes and reports are available at https://github.com/jiangjin1999/FormalEval.

large language model, logic & formal reasoning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.16998

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language

Li, Naiqi, Liu, Peiyuan, Liu, Zheng, Dai, Tao, Jiang, Yong, Xia, Shu-Tao

arXiv.org Artificial IntelligenceMay-23-2025

Solving puzzles in natural language poses a long-standing challenge in AI. While large language models (LLMs) have recently shown impressive capabilities in a variety of tasks, they continue to struggle with complex puzzles that demand precise reasoning and exhaustive search. In this paper, we propose Logic-of-Thought (Logot), a novel framework that bridges LLMs with logic programming to address this problem. Our method leverages LLMs to translate puzzle rules and states into answer set programs (ASPs), the solution of which are then accurately and efficiently inferred by an ASP interpreter. This hybrid approach combines the natural language understanding of LLMs with the precise reasoning capabilities of logic programs. We evaluate our method on various grid puzzles and dynamic puzzles involving actions, demonstrating near-perfect accuracy across all tasks. Our code and data are available at: https://github.com/naiqili/Logic-of-Thought.

large language model, machine learning, puzzle, (19 more...)

arXiv.org Artificial Intelligence

2505.16114

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Data-driven Verification of Procedural Programs with Integer Arrays

Bouajjani, Ahmed, Boutglay, Wael-Amine, Habermehl, Peter

arXiv.org Artificial IntelligenceMay-23-2025

We address the problem of verifying automatically procedural programs manipulating parametric-size arrays of integers, encoded as a constrained Horn clauses solving problem. We propose a new algorithmic method for synthesizing loop invariants and procedure pre/post-conditions represented as universally quantified first-order formulas constraining the array elements and program variables. We adopt a data-driven approach that extends the decision tree Horn-ICE framework to handle arrays. We provide a powerful learning technique based on reducing a complex classification problem of vectors of integer arrays to a simpler classification problem of vectors of integers . The obtained classifier is generalized to get universally quantified invariants and procedure pre/post-conditions. We have implemented our method and shown its efficiency and competitiveness w.r.t.

artificial intelligence, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2505.15958

Country:

Europe (1.00)
Asia (0.68)
Oceania (0.67)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)

Add feedback

Transductively Informed Inductive Program Synthesis

Zenkner, Janis, Sesterhenn, Tobias, Bartelt, Christian

arXiv.org Artificial IntelligenceMay-22-2025

Abstraction and reasoning in program synthesis has seen significant progress through both inductive and transductive paradigms. Inductive approaches generate a program or latent function from input-output examples, which can then be applied to new inputs. Transductive approaches directly predict output values for given inputs, effectively serving as the function themselves. Current approaches combine inductive and transductive models via isolated ensembling, but they do not explicitly model the interaction between both paradigms. In this work, we introduce \acs{tiips}, a novel framework that unifies transductive and inductive strategies by explicitly modeling their interactions through a cooperative mechanism: an inductive model generates programs, while a transductive model constrains, guides, and refines the search to improve synthesis accuracy and generalization. We evaluate \acs{tiips} on two widely studied program synthesis domains: string and list manipulation. Our results show that \acs{tiips} solves more tasks and yields functions that more closely match optimal solutions in syntax and semantics, particularly in out-of-distribution settings, yielding state-of-the-art performance. We believe that explicitly modeling the synergy between inductive and transductive reasoning opens promising avenues for general-purpose program synthesis and broader applications.

artificial intelligence, logic & formal reasoning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2505.14744

Genre:

Research Report > New Finding (0.86)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.75)

Add feedback

HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement

Hu, Jilin, Zhang, Jianyu, Zhao, Yongwang, Ringer, Talia

arXiv.org Artificial IntelligenceMay-22-2025

Formal methods is pivotal for verifying the reliability of critical systems through rigorous mathematical proofs. However, its adoption is hindered by labor-intensive manual proofs and the expertise required to use theorem provers. Recent advancements in large language models (LLMs) offer new opportunities for automated theorem proving. Two promising approaches are generating tactics step by step and generating a whole proof directly with an LLM. However, existing work makes no attempt to combine the two approaches. In this work, we introduce HybridProver, a dual-model proof synthesis framework that combines tactic-based generation and whole-proof synthesis to harness the benefits of both approaches. HybridProver generates whole proof candidates for evaluation directly, then extracts proof sketches from those candidates. It then uses a tactic-based generation model that integrates automated tools to complete the sketches via stepwise refinement. We implement HybridProver for the Isabelle theorem prover and fine-tune LLMs on our optimized Isabelle datasets. Evaluation on the miniF2F dataset illustrates HybridProver's effectiveness. We achieve a 59.4% success rate on miniF2F, where the previous SOTA is 56.1%. Our ablation studies show that this SOTA result is attributable to combining whole-proof and tactic-based generation. Additionally, we show how the dataset quality, training parameters, and sampling diversity affect the final result during automated theorem proving with LLMs. All of our code, datasets, and LLMs are open source.

large language model, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.1574

Country: North America > United States > California > Los Angeles County (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

On the Boolean Network Theory of Datalog$^\neg$

Trinh, Van-Giang, Benhamou, Belaid, Soliman, Sylvain, Fages, François

arXiv.org Artificial IntelligenceMay-21-2025

Datalog$^\neg$ is a central formalism used in a variety of domains ranging from deductive databases and abstract argumentation frameworks to answer set programming. Its model theory is the finite counterpart of the logical semantics developed for normal logic programs, mainly based on the notions of Clark's completion and two-valued or three-valued canonical models including supported, stable, regular and well-founded models. In this paper we establish a formal link between Datalog$^\neg$ and Boolean network theory first introduced for gene regulatory networks. We show that in the absence of odd cycles in a Datalog$^\neg$ program, the regular models coincide with the stable models, which entails the existence of stable models, and in the absence of even cycles, we prove the uniqueness of stable partial models and regular models. This connection also gives new upper bounds on the numbers of stable partial, regular, and stable models of a Datalog$^\neg$ program using the cardinality of a feedback vertex set in its atom dependency graph. Interestingly, our connection to Boolean network theory also points us to the notion of trap spaces. In particular we show the equivalence between subset-minimal stable trap spaces and regular models.

artificial intelligence, logic & formal reasoning, nonmonotonic reasoning, (18 more...)

arXiv.org Artificial Intelligence

2504.15417

Country: Europe (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Nonmonotonic Logic (0.92)

Add feedback

Modelling Real-time Systems with Bigraphs

Albalwe, Maram, Archibald, Blair, Sevegnani, Michele

arXiv.org Artificial IntelligenceMay-21-2025

Bigraphical Reactive Systems (BRSs) are a graph-rewriting formalism describing systems evolving in two dimensions: spatially, e.g. a person in a room, and non-spatially, e.g. mobile phones communicating regardless of location. Despite use in domains including communication protocols, agent programming, biology, and security, there is no support for real-time systems. We extend BRSs to support real-time systems with a modelling approach that uses multiple perspectives to represent digital clocks. We use Action BRSs, a recent extension of BRSs, where the resulting transition system is a Markov Decision Process (MDP). This allows a natural representation of the choices in each system state: to either allow time to pass or perform a specific action. We implement our proposed approach using the BigraphER toolkit, and demonstrate the effectiveness through multiple examples including modelling cloud system requests.

logic & formal reasoning, real time system, transition, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.417.6

2505.13449

Country: Europe > Austria (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.46)

Add feedback

Learning Probabilistic Temporal Logic Specifications for Stochastic Systems

Roy, Rajarshi, Pote, Yash, Parker, David, Kwiatkowska, Marta

arXiv.org Artificial IntelligenceMay-20-2025

There has been substantial progress in the inference of formal behavioural specifications from sample trajectories, for example using Linear Temporal Logic (L TL). However, these techniques cannot handle specifications that correctly characterise systems with stochastic behaviour, which occur commonly in reinforcement learning and formal verification. We consider the passive learning problem of inferring a Boolean combination of probabilistic L TL (PL TL) formulas from a set of Markov chains, classified as either positive or negative. We propose a novel learning algorithm that infers concise PL TL specifications, leveraging grammar-based enumeration, search heuristics, probabilistic model checking and Boolean set-cover procedures. We demonstrate the effectiveness of our algorithm in two use cases: learning from policies induced by RL algorithms and learning from variants of a probabilistic model. In both cases, our method automatically and efficiently extracts PL TL specifications that succinctly characterize the temporal differences between the policies or model variants.

formula, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.12107

Country:

Asia > Macao (0.04)
Asia > China (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(11 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

Liang, Zhenwen, Song, Linfeng, Li, Yang, Yang, Tao, Zhang, Feng, Mi, Haitao, Yu, Dong

arXiv.org Artificial IntelligenceMay-19-2025

Automated Theorem Proving (ATP) in formal languages remains a formidable challenge in AI, demanding rigorous logical deduction and navigating vast search spaces. While large language models (LLMs) have shown promising performance, existing stepwise provers often suffer from biased search guidance, leading to inefficiencies and suboptimal proof strategies. This paper introduces the Multi-Perspective Search Prover (MPS-Prover), a novel stepwise ATP system designed to overcome these limitations. MPS-Prover incorporates two key innovations: a highly effective post-training data curation strategy that prunes approximately 40% of redundant training data without sacrificing performance, and a multi-perspective tree search mechanism. This search integrates a learned critic model with strategically designed heuristic rules to diversify tactic selection, prevent getting trapped in unproductive states, and enhance search robustness. Extensive evaluations demonstrate that MPS-Prover achieves state-of-the-art performance on multiple challenging benchmarks, including miniF2F and ProofNet, outperforming prior 7B parameter models. Furthermore, our analyses reveal that MPS-Prover generates significantly shorter and more diverse proofs compared to existing stepwise and whole-proof methods, highlighting its efficiency and efficacy. Our work advances the capabilities of LLM-based formal reasoning and offers a robust framework and a comprehensive analysis for developing more powerful theorem provers.

large language model, logic & formal reasoning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.10962

Country:

Europe > Italy (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Forgetting in short and heterogeneous sequences of belief revisions

Liberatore, Paolo

arXiv.org Artificial IntelligenceMay-19-2025

Forgetting a specific belief revision episode may not erase information because the other revisions may provide or entail the same information. Whether it does was proved coNP-hard for sequences of two arbitrary lexicographic revisions or arbitrarily long lexicographic Horn revisions. A polynomial algorithm is presented for the case of two lexicographic Horn revision. Heterogeneous sequences, including revisions other than lexicographic, were proved to belong in Delta2. Their previously proved coNP-hardness is enhanced to Dp-hardness.

artificial intelligence, belief revision, logic & formal reasoning, (20 more...)

arXiv.org Artificial Intelligence

2504.13986

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)

Add feedback