isabelle
FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving
Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as another line of rigorous verification, are maintained with comprehensive rules and theorems. In this paper, we propose FVEL, an interactive Formal Verification Environment with LLMs. Specifically, FVEL transforms a given code to be verified into Isabelle, and then conducts verification via neural automated theorem proving with an LLM. The joined paradigm leverages the rigorous yet abundant formulated and organized rules in Isabelle and is also convenient for introducing and adjusting cutting-edge LLMs. To achieve this goal, we extract a large-scale FVELER. The FVELER dataset includes code dependencies and verification processes that are formulated in Isabelle, containing 758 theories, 29,304 lemmas, and 201,498 proof steps in total with in-depth dependencies.
Thor: WieldingHammerstoIntegrateLanguage ModelsandAutomatedTheoremProvers
In theorem proving, the task of selecting useful premises from alarge library to unlock the proof of a given conjecture is crucially important. This presents a challenge foralltheorem provers,especially theonesbasedonlanguage models, due to their relative inability to reason over huge volumes of premises in text form.
The Brilliant New Movie About Alexander Skarsg em å /em rd Making Dudley Dursley His Toy
Fans of will be happy to hear that there's been another entry into the world of scintillating gay romance. The film stars noted on-screen sex haver Alexander Skarsgård--he's equally provocative in the NC-17-rated --and some guy named Harry Melling, who seems to have been in . Melling plays Colin, a certified beta whose deepest desire is to serve. He gets his wish when he meets Ray (Skarsgård), a toppy, Tom of Finland -esque biker with an attitude so icy it could preserve food. The two enter into a full-time power-exchange relationship that fuels both of their desires, until their connection evolves to a heart-wrenching breaking point. Unlike other recent films about kink that were bound and gagged by their own corniness--think and -- has been lauded as realistic, sophisticated, and smart, and the movie is currently sitting at 100 percent on Rotten Tomatoes . Still, was it enough to satisfy senior editor Isabelle Kohn and How to Do It columnist Rich Juzwiak? Be a good boy and find out.
Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing
Large Language Models (LLMs) have recently emerged as powerful tools for autoformalization. Despite their impressive performance, these models can still struggle to produce grounded and verifiable formalizations. Recent work in text-to-SQL, has revealed that LLMs can be sensitive to paraphrased natural language (NL) inputs, even when high degrees of semantic fidelity are preserved (Safarzadeh, Oroo-jlooyjadid, and Roth 2025). In this paper, we investigate this claim in the autoformalization domain. Specifically, we evaluate the robustness of LLMs generating formal proofs with semantically similar paraphrased NL statements by measuring semantic and compilation validity. Using the formal benchmarks MiniF2F (Zheng, Han, and Polu 2021) and Lean 4 version of ProofNet (Xin et al. 2024), and two modern LLMs, we generate paraphrased natural language statements and cross-evaluate these statements across both models. The results of this paper reveal performance variability across paraphrased inputs, demonstrating that minor shifts in NL statements can significantly impact model outputs.