Goto

Collaborating Authors

 symbolic execution






Terra: Imperative-SymbolicCo-Executionof ImperativeDeepLearningPrograms

Neural Information Processing Systems

On the other hand, in the latter model, the Python interpreter embeds DL operations into a symbolic graph that represents the entire dataflow of aDNN. Thus, users should define their DL programs only with existing symbolic operations thatDLframeworks support.



Can Large Language Models Autoformalize Kinematics?

Kabra, Aditi, Laurent, Jonathan, Bharadwaj, Sagar, Martins, Ruben, Mitsch, Stefan, Platzer, André

arXiv.org Artificial Intelligence

Autonomous cyber-physical systems like robots and self-driving cars could greatly benefit from using formal methods to reason reliably about their control decisions. However, before a problem can be solved it needs to be stated. This requires writing a formal physics model of the cyber-physical system, which is a complex task that traditionally requires human expertise and becomes a bottleneck. This paper experimentally studies whether Large Language Models (LLMs) can automate the formalization process. A 20 problem benchmark suite is designed drawing from undergraduate level physics kinematics problems. In each problem, the LLM is provided with a natural language description of the objects' motion and must produce a model in differential game logic (dGL). The model is (1) syntax checked and iteratively refined based on parser feedback, and (2) semantically evaluated by checking whether symbolically executing the dGL formula recovers the solution to the original physics problem. A success rate of 70% (best over 5 samples) is achieved. We analyze failing cases, identifying directions for future improvement. This provides a first quantitative baseline for LLM-based autoformalization from natural language to a hybrid games logic with continuous dynamics.



Compiling by Proving: Language-Agnostic Automatic Optimization from Formal Semantics

Zhao, Jianhong, Hildenbrandt, Everett, Conejero, Juan, Zhao, Yongwang

arXiv.org Artificial Intelligence

Verification proofs encode complete program behavior, yet we discard them after checking correctness. We present compiling by proving, a paradigm that transforms these proofs into optimized execution rules. By constructing All-Path Reachability Proofs through symbolic execution and compiling their graph structure, we consolidate many semantic rewrites into single rules while preserving correctness by construction. We implement this as a language-agnostic extension to the K framework. Evaluation demonstrates performance improvements across different compilation scopes: opcode-level optimizations show consistent speedups, while whole-program compilation achieves orders of magnitude greater performance gains.


AutoStub: Genetic Programming-Based Stub Creation for Symbolic Execution

Mächtle, Felix, Loose, Nils, Serr, Jan-Niclas, Sander, Jonas, Eisenbarth, Thomas

arXiv.org Artificial Intelligence

Symbolic execution is a powerful technique for software testing, but suffers from limitations when encountering external functions, such as native methods or third-party libraries. Existing solutions often require additional context, expensive SMT solvers, or manual intervention to approximate these functions through symbolic stubs. In this work, we propose a novel approach to automatically generate symbolic stubs for external functions during symbolic execution that leverages Genetic Programming. When the symbolic executor encounters an external function, AutoStub generates training data by executing the function on randomly generated inputs and collecting the outputs. Genetic Programming then derives expressions that approximate the behavior of the function, serving as symbolic stubs. These automatically generated stubs allow the symbolic executor to continue the analysis without manual intervention, enabling the exploration of program paths that were previously intractable. We demonstrate that AutoStub can automatically approximate external functions with over 90% accuracy for 55% of the functions evaluated, and can infer language-specific behaviors that reveal edge cases crucial for software testing.