Goto

Collaborating Authors

 verus


VeriStruct: AI-assisted Automated Verification of Data-Structure Modules in Verus

Sun, Chuyue, Sun, Yican, Amrollahi, Daneshvar, Zhang, Ethan, Lahiri, Shuvendu, Lu, Shan, Dill, David, Barrett, Clark

arXiv.org Artificial Intelligence

We introduce VeriStruct, a novel framework that extends AI-assisted automated verification from single functions to more complex data structure modules in Verus. VeriStruct employs a planner module to orchestrate the systematic generation of abstractions, type invariants, specifications, and proof code. To address the challenge that LLMs often misunderstand Verus' annotation syntax and verification-specific semantics, VeriStruct embeds syntax guidance within prompts and includes a repair stage to automatically correct annotation errors. In an evaluation on eleven Rust data structure modules, VeriStruct succeeds on ten of the eleven, successfully verifying 128 out of 129 functions (99.2%) in total. These results represent an important step toward the goal of automatic AI-assisted formal verification.


Can LLMs Enable Verification in Mainstream Programming?

Shefer, Aleksandr, Engel, Igor, Alekseev, Stanislav, Berezun, Daniil, Verbitskaia, Ekaterina, Podkopaev, Anton

arXiv.org Artificial Intelligence

Although formal methods are capable of producing reliable software, they have seen minimal adoption in everyday programming. Automatic code generation using large language models is becoming increasingly widespread, but it rarely considers producing strong correctness guarantees. In this study, we explore the ability of LLMs to produce verified code in three verification languages (Dafny, Nagini, and Verus). To do so, we use manually curated datasets derived from the state-of-the-art Python benchmark, HumanEval. We also assess what types of information are sufficient to achieve good-quality results.


Automated Proof Generation for Rust Code via Self-Evolution

Chen, Tianyu, Lu, Shuai, Lu, Shan, Gong, Yeyun, Yang, Chenyuan, Li, Xuheng, Misu, Md Rakib Hossain, Yu, Hao, Duan, Nan, Cheng, Peng, Yang, Fan, Lahiri, Shuvendu K, Xie, Tao, Zhou, Lidong

arXiv.org Artificial Intelligence

Ensuring correctness is crucial for code generation. Formal verification offers a definitive assurance of correctness, but demands substantial human effort in proof construction and hence raises a pressing need for automation. The primary obstacle lies in the severe lack of data -- there is much less proof than code for LLMs to train upon. In this paper, we introduce SAFE, a novel framework that overcomes the lack of human-written proof to enable automated proof generation of Rust code. SAFE establishes a self-evolving cycle where data synthesis and fine-tuning collaborate to enhance the model capability, leveraging the definitive power of a symbolic verifier in telling correct proof from incorrect ones. SAFE also re-purposes the large number of synthesized incorrect proofs to train the selfdebugging capability of the fine-tuned models, empowering them to fix incorrect proofs based on the verifier's feedback. SAFE demonstrates superior efficiency and precision compared to GPT-4o. Through tens of thousands of synthesized proofs and the self-debugging mechanism, we improve the capability of opensource models, initially unacquainted with formal verification, to automatically write proof for Rust code. This advancement leads to a significant improvement in performance, achieving a 70.50% accuracy rate in a benchmark crafted by human experts, a significant leap over GPT-4o's performance of 24.46%. Large Language Models (LLMs) have recently exhibited impressive capabilities in code generation (Roziere et al., 2023; Guo et al., 2024; Lozhkov et al., 2024; Google, 2024). However, the correctness of generated code cannot be guaranteed. To tackle this issue, prior research (Chen et al., 2022; Zhang et al., 2024a; Huang et al., 2023) has explored assessing generated code by test cases, which sometimes are also generated by LLMs. Although helpful, testing cannot cover all possible program inputs; they can reveal the presence of bugs but cannot prove their absence (Dahl et al., 1972).


Leveraging Large Language Models for Automated Proof Synthesis in Rust

Yao, Jianan, Zhou, Ziqiao, Chen, Weiteng, Cui, Weidong

arXiv.org Artificial Intelligence

Formal verification can provably guarantee the correctness of critical system software, but the high proof burden has long hindered its wide adoption. Recently, Large Language Models (LLMs) have shown success in code analysis and synthesis. In this paper, we present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus. In a few-shot setting, LLMs demonstrate impressive logical ability in generating postconditions and loop invariants, especially when analyzing short code snippets. However, LLMs lack the ability to retain and propagate context information, a strength of traditional static analysis. Based on these observations, we developed a prototype based on OpenAI's GPT-4 model. Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis. We evaluated the prototype with a developer in the automation loop on 20 vector-manipulating programs. The results demonstrate that it significantly reduces human effort in writing entry-level proof code.


AI surveillance takes U.S. prisons by storm

AITopics Custom Links

LOS ANGELES/WASHINGTON, Nov 15 (Thomson Reuters Foundation) - When the sheriff in Suffolk County, New York, requested $700,000 from the U.S. government for an artificial intelligence system to eavesdrop on prison phone conversations, his office called it a key tool in fighting gang-related and violent crime. But the county jail ended up listening to calls involving a much wider range of subjects - scanning as many as 600,000 minutes per month, according to public records from the county obtained by the Thomson Reuters Foundation. Beginning in 2019, Suffolk County was an early pilot site for the Verus AI-scanning system sold by California-based LEO Technologies, which uses Amazon speech-to-text technology to transcribe phone calls flagged by key word searches. The company and law enforcement officials say it is a crucial tool to keep prisons and jails safe, and fight crime, but critics say such systems trample the privacy rights of prisoners and other people, like family members, on the outside. "The ability to surveil and listen at scale in this rapid way - it is incredibly scary and chilling," said Julie Mao, deputy director at Just Futures Law, an immigration legal group.


'Scary and chilling': AI surveillance takes U.S. prisons by storm

The Japan Times

When the sheriff in Suffolk County, New York, requested $700,000 from the U.S. government for an artificial intelligence system to eavesdrop on prison phone conversations, his office called it a key tool in fighting gang-related and violent crime. But the county jail ended up listening to calls involving a much wider range of subjects -- scanning as many as 600,000 minutes per month, according to public records from the county obtained by the Thomson Reuters Foundation. Beginning in 2019, Suffolk County was an early pilot site for the Verus AI-scanning system sold by California-based LEO Technologies, which uses Amazon speech-to-text technology to transcribe phone calls flagged by keyword searches. The company and law enforcement officials say it is a crucial tool to keep prisons and jails safe, and to fight crime, but critics say such systems trample the privacy rights of prisoners and other people, like family members, on the outside. "T he ability to surveil and listen at scale in this rapid way -- it is incredibly scary and chilling," said Julie Mao, deputy director at Just Futures Law, an immigration legal group.