AITopics | logic & formal reasoning

Collaborating Authors

logic & formal reasoning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

Neural Information Processing SystemsJun-22-2026, 18:11:58 GMT

Automated Theorem Proving (ATP) in formal languages remains a formidable challenge in AI, demanding rigorous logical deduction and navigating vast search spaces. While large language models (LLMs) have shown promising performance, existing stepwise provers often suffer from biased search guidance, leading to inefficiencies and suboptimal proof strategies. This paper introduces the MultiPerspective Search Prover (MPS-Prover), a novel stepwise ATP system designed to overcome these limitations. MPS-Prover incorporates two key innovations: a highly effective post-training data curation strategy that prunes approximately 40% of redundant training data without sacrificing performance, and a multi-perspective tree search mechanism. This search integrates a learned critic model with strategically designed heuristic rules to diversify tactic selection, prevent getting trapped in unproductive states, and enhance search robustness. Extensive evaluations demonstrate that MPS-Prover achieves state-of-the-art performance on multiple challenging benchmarks, including miniF2F and ProofNet, outperforming prior 7B parameter models. Furthermore, our analyses reveal that MPS-Prover generates significantly shorter and more diverse proofs compared to existing stepwise and whole-proof methods, highlighting its efficiency and efficacy. Our work advances the capabilities of LLM-based formal reasoning and offers a robust framework and a comprehensive analysis for developing more powerful theorem provers.

large language model, logic & formal reasoning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Automaton Constrained Q-Learning

Neural Information Processing SystemsJun-22-2026, 16:53:38 GMT

Real-world robotic tasks often require agents to achieve sequences of goals while respecting time-varying safety constraints. However, standard Reinforcement Learning (RL) paradigms are fundamentally limited in these settings. A natural approach to these problems is to combine RL with Linear-time Temporal Logic (LTL), a formal language for specifying complex, temporally extended tasks and safety constraints.

logic & formal reasoning, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Metric Automata Theory: AUnifying Theory of RNNs

Neural Information Processing SystemsJun-22-2026, 14:49:42 GMT

We propose Metric Automata Theory, an elegant generalisation of classic Automata Theory to continuous dynamical systems, that constitutes a unifying theory of all kinds of Recurrent Neural Networks (RNNs), including widely-adopted architectures such as xLSTM and State Space Models (SSMs). The theory allows one to analyse RNNs both in the finite and unbounded precision settings seamlessly, while utilising fundamental results of Automata Theory. It also provides a novel notion of robustness that guarantees numerical stability and contributes to stability of learning. We employ the theory to prove a comprehensive set of expressivity results for widely-adopted RNNs, with a focus on robustness and finite-precision. Notably, we contrast the capabilities of xLSTM and SSMs for robustly modelling all star-free regular languages--xLSTM can do so, while SSMs cannot robustly recognize the FLIP-FLOP language.

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CLEVER: ACurated Benchmark for Formally Verified Code Generation

Neural Information Processing SystemsJun-21-2026, 10:26:47 GMT

We introduce CLEVER1, a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a specification that matches a held-out ground-truth specification, and (2) the task of generating a Lean implementation that provably satisfies this specification. Unlike prior benchmarks, CLEVER avoids test-case supervision, LLM-generated annotations, and specifications that leak implementation logic or allow vacuous solutions. All outputs are verified post-hoc using Lean's type checker to ensure machine-checkable correctness. We use CLEVER to evaluate several few-shot and agentic approaches based on state-of-the-art language models. These methods all struggle to achieve full verification, establishing it as a challenging frontier benchmark for program synthesis and formal reasoning. Our benchmark can be found on GitHub as well as HuggingFace. All our evaluation code is also available online.

Add feedback

HypoBootstrap: ABootstrapping Framework for Inductive Reasoning

Neural Information Processing SystemsJun-20-2026, 18:16:00 GMT

Inductive reasoning infers general rules from observed evidence, which is one of the most critical intelligence abilities. Previous works have succeeded in formal languages but suffer from onerous and error-prone conversions between a particular formal language and the working language. As large language models (LLMs) have emerged, direct reasoning with various kinds of languages, especially natural languages, without formal language involvement has become feasible. However, existing LLM-based inductive reasoning usually relies on LLM's intrinsic generation ability, which is prone to LLM's hallucination and lacks systematic guidance according to the nature of inductive reasoning. To this end, we propose HypoBootstrap, an integrated framework for inductive reasoning that generates and confirms hypotheses both in a bootstrapping manner. Regarding hypothesis generation, we propose a novel bootstrapping generation strategy, bootstrapping object hypotheses, relational hypotheses, and functional hypotheses successively, which assists LLM in observing the evidence from trivial patterns to non-trivial patterns. Regarding hypothesis confirmation, we utilize Glymour's theory of bootstrap confirmation, a hypothesis confirmation theory from the philosophy of science that can confirm a set of hypotheses. We use its principles to confirm the object hypotheses, relational hypotheses, and functional hypotheses. Empirical studies on four inductive reasoning scenarios of different natures, involving causal induction, concept learning, grammar learning, and abstract reasoning, demonstrate that HypoBootstrap significantly outperforms existing methods.

large language model, logic & formal reasoning, machine learning, (23 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.92)
Asia (0.67)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.74)

Add feedback

miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

Neural Information Processing SystemsJun-18-2026, 11:22:38 GMT

We perform a thorough analysis of the formal and informal statements in the miniF2F benchmark from the perspective of an AI system that is tasked to participate in a math Olympiad consisting of the problems in miniF2F. In such setting, the model has to read and comprehend the problems in natural language, formalize them in Lean language, then proceed with proving the problems, and it will get credit for each problem if the formal proof corresponds to the original informal statement presented to the model. Our evaluation results reveal that the best accuracy of such pipeline can be about 36% using the SoTA models in the literature, considerably lower than the individual SoTA accuracies, 97% and 69% reported in the autoformalization and theorem proving literature. Analyzing the failure modes, we trace back a considerable portion of this drop to discrepancies between the formal and informal statements for more than half of the problems in miniF2F. We proceed with correcting all the errors, discrepancies and simplifications in formal and informal statements, and present the miniF2F-v2 with fully verified formal and informal statements and proofs. Evaluating the full theorem proving pipeline on miniF2F-v2 leads to the best accuracy of 70%, a significant improvement from the 40% on the original miniF2F, yet indicating considerable misalignment between the autoformalization models and theorem provers. Our deep analysis suggests that a higher quality benchmark can help the community better evaluate progress in the field of formal reasoning and also better diagnose the failure and success modes of autoformalization and theorem proving models.

large language model, logic & formal reasoning, machine learning, (21 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Let a Neural Network Be Your Invariant

Neural Information Processing SystemsJun-18-2026, 04:52:23 GMT

Safety verification ensures that a system avoids undesired behaviour. Liveness complements safety, ensuring that the system also achieves its desired objectives. A complete specification of functional correctness must combine both safety and liveness. Proving with mathematical certainty that a system satisfies a safety property demands presenting an appropriate inductive invariant of the system, whereas proving liveness requires showing a measure of progress witnessed by a ranking function. Neural model checking has recently introduced a data-driven approach to the formal verification of reactive systems, albeit focusing on ranking functions and thus addressing liveness properties only.

logic & formal reasoning, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
Asia > Japan (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.67)
Information Technology (0.46)
Energy > Power Industry > Utilities > Nuclear (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.93)

Add feedback

62d8cb520f9ba0674daf95491ea60f81-Paper-Conference.pdf

Neural Information Processing SystemsJun-17-2026, 20:00:34 GMT

Modern language models (LMs) exhibit strong deductive reasoning capabilities, yet standard evaluations emphasize correctness while overlooking a key aspect of reasoning: efficiency. In real-world reasoning scenarios, much of the available information is irrelevant, and effective deductive inference requires identifying and ignoring such distractions. We propose a framework for assessing LM reasoning efficiency through the lens of logic programming, introducing a simple method to align proofs written in natural language--as generated by an LM--with shortest proofs found by executing the logic program. Efficiency is quantified by measuring how well a model avoids unnecessary inference. Empirically, we construct a dataset of math word problems injected with various number of irrelevant axioms that vary in semantic overlap with the goal theorem. We find that current LMs show marked accuracy declines under such conditions--even with minimal, domainconsistent distractions--and the proofs they generate frequently exhibit detours through irrelevant inferences.2

large language model, logic & formal reasoning, machine learning, (22 more...)

Neural Information Processing Systems

Country:

Asia (1.00)
Europe (0.92)
North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SpecMAS: AMulti-Agent System for Self-Verifying System Generation via Formal Model Checking

Neural Information Processing SystemsJun-17-2026, 18:35:06 GMT

Given a Standard Operating Procedure (SOP) describing a target system, SpecMAS parses the specification, identifies relevant operational modes, variables, transitions, and properties, and generates a formal model in NuSMV code syntax, an industry-standard symbolic model checker. A dedicated reasoning agent extracts both explicit and implicit properties from the SOP, and verification is performed via temporal logic model checking. If any properties fail to verify, an autonomous debugging agent analyzes counterexamples and iteratively corrects the model until all properties are satisfied. This closed-loop system design guarantees provable correctness by construction and advances the state of the art in automated, interpretable, and deployable verification pipelines. We demonstrate the generality, correctness, and practical feasibility of SpecMAS across a set of representative case studies and propose a new benchmark dataset for the evaluation and comparison of model checking performance.

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning Simple Interpolants for Linear Integer Arithmetic

Neural Information Processing SystemsJun-17-2026, 11:28:33 GMT

Craig interpolation plays a central role in formal verification tasks such as model checking, invariant generation, and abstraction refinement. In the domain of linear integer arithmetic (LIA), interpolants are crucial for deriving inductive invariants that characterize unreachable or safe program states, enabling scalable and precise reasoning about software and hardware correctness. Despite progress in interpolation algorithms, generating concise and interpretable interpolants remains a key challenge. We propose a lightweight learning-based approach to generating simple interpolants for LIA. Our model learns to lazily sample input problems directly and is complementary to existing logical methods. We show that when Z3 is guided by our learned model, the complexity of the interpolants it produces can be reduced by up to 47.3%. For older solvers, the reduction rate can reach up to 69.1%.

logic & formal reasoning, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback