AITopics

2506.00615

Country:

Europe (0.94)
North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.69)

arXiv.org Artificial IntelligenceJun-17-2025

Towards a Formal Specification for Self-organized Shape Formation in Swarm Robotics

Darr, YR, Niazi, MA

The self-organization of robots for the formation of structures and shapes is a stimulating application of the swarm robotic system. It involves a large number of autonomous robots of heterogeneous behavior, coordination among them, and their interaction with the dynamic environment. This process of complex structure formation is considered a complex system, which needs to be modeled by using any modeling approach. Although the formal specification approach along with other formal methods has been used to model the behavior of robots in a swarm. However, to the best of our knowledge, the formal specification approach has not been used to model the self-organization process in swarm robotic systems for shape formation. In this paper, we use a formal specification approach to model the shape formation task of swarm robots. We use Z (Zed) language of formal specification, which is a state-based language, to model the states of the entities of the systems. We demonstrate the effectiveness of Z for the self-organized shape formation. The presented formal specification model gives the outlines for designing and implementing the swarm robotic system for the formation of complex shapes and structures. It also provides the foundation for modeling the complex shape formation process for swarm robotics using a multi-agent system in a simulation-based environment. Keywords: Swarm robotics, Self-organization, Formal specification, Complex systems

artificial intelligence, logic & formal reasoning, robot, (15 more...)

2506.13453

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

arXiv.org Artificial IntelligenceJun-16-2025

Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models

Cao, Chenrui, Song, Liangcheng, Li, Zenan, Le, Xinyi, Zhang, Xian, Xue, Hui, Yang, Fan

Recent advancements, such as DeepSeek-Prover-V2-671B and Kimina-Prover-Preview-72B, demonstrate a prevailing trend in leveraging reinforcement learning (RL)-based large-scale training for automated theorem proving. Surprisingly, we discover that even without any training, careful neuro-symbolic coordination of existing off-the-shelf reasoning models and tactic step provers can achieve comparable performance. This paper introduces \textbf{DSP+}, an improved version of the Draft, Sketch, and Prove framework, featuring a \emph{fine-grained and integrated} neuro-symbolic enhancement for each phase: (1) In the draft phase, we prompt reasoning models to generate concise natural-language subgoals to benefit the sketch phase, removing thinking tokens and references to human-written proofs; (2) In the sketch phase, subgoals are autoformalized with hypotheses to benefit the proving phase, and sketch lines containing syntactic errors are masked according to predefined rules; (3) In the proving phase, we tightly integrate symbolic search methods like Aesop with step provers to establish proofs for the sketch subgoals. Experimental results show that, without any additional model training or fine-tuning, DSP+ solves 80.7\%, 32.8\%, and 24 out of 644 problems from miniF2F, ProofNet, and PutnamBench, respectively, while requiring fewer budgets compared to state-of-the-arts. DSP+ proves \texttt{imo\_2019\_p1}, an IMO problem in miniF2F that is not solved by any prior work. Additionally, DSP+ generates proof patterns comprehensible by human experts, facilitating the identification of formalization errors; For example, eight wrongly formalized statements in miniF2F are discovered. Our results highlight the potential of classical reasoning patterns besides the RL-based training. All components will be open-sourced.

large language model, logic & formal reasoning, machine learning, (23 more...)

2506.11487

Country:

North America > United States (0.04)
Europe > Germany > Berlin (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJun-16-2025

Towards Automated Formal Verification of Backend Systems with LLMs

Xu, Kangping, Luo, Yifan, Yuan, Yang, Yao, Andrew Chi-Chih

Software testing plays a critical role in ensuring that systems behave as intended. However, existing automated testing approaches struggle to match the capabilities of human engineers due to key limitations such as test locality, lack of general reliability, and business logic blindness. In this work, we propose a novel framework that leverages functional programming and type systems to translate Scala backend code into formal Lean representations. Our pipeline automatically generates theorems that specify the intended behavior of APIs and database operations, and uses LLM-based provers to verify them. When a theorem is proved, the corresponding logic is guaranteed to be correct and no further testing is needed. If the negation of a theorem is proved instead, it confirms a bug. In cases where neither can be proved, human intervention is required. We evaluate our method on realistic backend systems and find that it can formally verify over 50% of the test requirements, which suggests that half of a testing engineer's workload can be automated. Additionally, with an average cost of only $2.19 per API, LLM-based verification is significantly more cost-effective than manual testing and can be scaled easily through parallel execution. Our results indicate a promising direction for scalable, AI-powered software testing, with the potential to greatly improve engineering productivity as models continue to advance.

large language model, machine learning, programming language, (26 more...)

2506.10998

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(2 more...)

Beg, Arshad, O'Donoghue, Diarmuid, Monahan, Rosemary

Formalising Software Requirements using Large Language Models

This paper is a brief introduction to our recently initiated project named VERIFAI: Traceability and verification of natural language requirements. The project addresses the challenges in the traceability and verification of formal specifications through providing support for the automatic generation of the formal specifications and the traceability of the requirements from the initial software design stage through the systems implementation and verification. Approaches explored in this project include Natural Language Processing, use of ontologies to describe the software system domain, reuse of existing software artefacts from similar systems (i.e. through similarity based reuse) and large language models to identify and declare the specifications as well as use of artificial intelligence to guide the process.

large language model, logic & formal reasoning, specification, (18 more...)

2506.10704

Country:

Europe > Ireland (0.05)
Europe > Switzerland (0.04)
Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question Answering

Ishay, Adam, Yang, Zhun, Lee, Joohyung, Kang, Ilgu, Lim, Dongjae

Causal and temporal reasoning about video dynamics is a challenging problem. While neuro-symbolic models that combine symbolic reasoning with neural-based perception and prediction have shown promise, they exhibit limitations, especially in answering counterfactual questions. This paper introduces a method to enhance a neuro-symbolic model for counterfactual reasoning, leveraging symbolic reasoning about causal relations among events. We define the notion of a causal graph to represent such relations and use Answer Set Programming (ASP), a declarative logic programming method, to find how to coordinate perception and simulation modules. We validate the effectiveness of our approach on two benchmarks, CLEVRER and CRAFT. Our enhancement achieves state-of-the-art performance on the CLEVRER challenge, significantly outperforming existing models. In the case of the CRAFT benchmark, we leverage a large pre-trained language model, such as GPT-3.5 and GPT-4, as a proxy for a dynamics simulator. Our findings show that this method can further improve its performance on counterfactual questions by providing alternative prompts instructed by symbolic causal reasoning.

large language model, logic & formal reasoning, machine learning, (22 more...)

2506.10753

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Arizona (0.04)
Asia > Macao (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.75)

Bartholomew, Michael, Lee, Joohyung

System ASPMT2SMT:Computing ASPMT Theories by SMT Solvers

Answer Set Programming Modulo Theories (ASPMT) is an approach to combining answer set programming and satisfiability modulo theories based on the functional stable model semantics. It is shown that the tight fragment of ASPMT programs can be turned into SMT instances, thereby allowing SMT solvers to compute stable models of ASPMT programs. In this paper we present a compiler called {\sc aspsmt2smt}, which implements this translation. The system uses ASP grounder {\sc gringo} and SMT solver {\sc z3}. {\sc gringo} partially grounds input programs while leaving some variables to be processed by {\sc z3}. We demonstrate that the system can effectively handle real number computations for reasoning about continuous changes.

artificial intelligence, aspmt 2, logic & formal reasoning, (17 more...)

2506.10708

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.93)

Bennetzen, Benjamin, Kristensen, Nikolaj Rossander, Steffensen, Peter Buus

Encoding call-by-push-value in the pi-calculus

In this report we define an encoding of Levys call-by-push-value lambda-calculus (CBPV) in the pi-calculus, and prove that our encoding is both sound and complete. We present informal (by-hand) proofs of soundness, completeness, and all required lemmas. The encoding is specialized to the internal pi-calculus (pi-i-calculus) to circumvent certain challenges associated with using de Bruijn index in a formalization, and it also helps with bisimulation as early-, late- and open-bisimulation coincide in this setting, furthermore bisimulation is a congruence. Additionally, we argue that our encoding also satisfies the five criteria for good encodings proposed by Gorla, as well as show similarities between Milners and our encoding. This paper includes encodings from CBPV in the pi-i-calculus, asynchronous polyadic pi-calculus and the local pi-calculus. We begin a formalization of the proof in Coq for the soundness and completeness of the encoding in the pi-i-calculus. Not all lemmas used in the formalization are themselves formally proven. However, we argue that the non-proven lemmas are reasonable, as they are proven by hand, or amount to Coq formalities that are straightforward given informal arguments.

artificial intelligence, logic & formal reasoning, reduction, (20 more...)

2506.10584

Country:

North America > United States (0.27)
Europe > United Kingdom (0.27)

Genre: Research Report (0.63)

Industry: Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.47)

Abavisani, Mohammadsajad, Solovyeva, Kseniya, Danks, David, Calhoun, Vince, Plis, Sergey

Causal Graph Recovery in Neuroimaging through Answer Set Programming

arXiv.org Artificial IntelligenceJun-12-2025

Learning graphical causal structures from time series data presents significant challenges, especially when the measurement frequency does not match the causal timescale of the system. This often leads to a set of equally possible underlying causal graphs due to information loss from sub-sampling (i.e., not observing all possible states of the system throughout time). Our research addresses this challenge by incorporating the effects of sub-sampling in the derivation of causal graphs, resulting in more accurate and intuitive outcomes. We use a constraint optimization approach, specifically answer set programming (ASP), to find the optimal set of answers. ASP not only identifies the most probable underlying graph, but also provides an equivalence class of possible graphs for expert selection. In addition, using ASP allows us to leverage graph theory to further prune the set of possible solutions, yielding a smaller, more accurate answer set significantly faster than traditional approaches. We validate our approach on both simulated data and empirical structural brain connectivity, and demonstrate its superiority over established methods in these experiments. We further show how our method can be used as a meta-approach on top of established methods to obtain, on average, 12% improvement in F1 score. In addition, we achieved state of the art results in terms of precision and recall of reconstructing causal graph from sub-sampled time series data. Finally, our method shows robustness to varying degrees of sub-sampling on realistic simulations, whereas other methods perform worse for higher rates of sub-sampling.

artificial intelligence, logic & formal reasoning, optimization problem, (18 more...)

2506.09286

Genre:

Research Report > New Finding (0.47)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.48)

arXiv.org Artificial IntelligenceJun-11-2025

LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs

Patel, Manooshree, Bhattacharyya, Rayna, Lu, Thomas, Mehta, Arnav, Voss, Niels, Norouzi, Narges, Ranade, Gireeja

We present LeanTutor, a Large Language Model (LLM)-based tutoring system for math proofs. LeanTutor interacts with the student in natural language, formally verifies student-written math proofs in Lean, generates correct next steps, and provides the appropriate instructional guidance. LeanTutor is composed of three modules: (i) an autoformalizer/proof-checker, (ii) a next-step generator, and (iii) a natural language feedback generator. The first module faithfully autoformalizes student proofs into Lean and verifies proof accuracy via successful code compilation. If the proof has an error, the incorrect step is identified. The next-step generator module outputs a valid next Lean tactic for incorrect proofs via LLM-based candidate generation and proof search. The feedback generator module leverages Lean data to produce a pedagogically-motivated natural language hint for the student user. To evaluate our system, we introduce PeanoBench, a human-written dataset derived from the Natural Numbers Game, consisting of 371 Peano Arithmetic proofs, where each natural language proof step is paired with the corresponding logically equivalent tactic in Lean. The Autoformalizer correctly formalizes 57% of tactics in correct proofs and accurately identifies the incorrect step in 30% of incorrect proofs. In generating natural language hints for erroneous proofs, LeanTutor outperforms a simple baseline on accuracy and relevance metrics.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2506.08321

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (0.93)
Education > Curriculum > Subject-Specific Education (0.93)
Education > Educational Technology > Educational Software > Computer Based Training (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)