Can LLMs Enable Verification in Mainstream Programming?

Shefer, Aleksandr, Engel, Igor, Alekseev, Stanislav, Berezun, Daniil, Verbitskaia, Ekaterina, Podkopaev, Anton

Mar-18-2025–arXiv.org Artificial Intelligence

Although formal methods are capable of producing reliable software, they have seen minimal adoption in everyday programming. Automatic code generation using large language models is becoming increasingly widespread, but it rarely considers producing strong correctness guarantees. In this study, we explore the ability of LLMs to produce verified code in three verification languages (Dafny, Nagini, and Verus). To do so, we use manually curated datasets derived from the state-of-the-art Python benchmark, HumanEval. We also assess what types of information are sufficient to achieve good-quality results.

large language model, machine learning, specification, (15 more...)

arXiv.org Artificial Intelligence

Mar-18-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Florida > Pinellas County > St. Petersburg (0.04)
- Europe
  - Middle East > Cyprus (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Germany > Bremen
    - Bremen (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found