QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning
Sanchez-Stern, Alex, Varghese, Abhishek, Kaufman, Zhanna, Zhang, Dylan, Ringer, Talia, Brun, Yuriy
–arXiv.org Artificial Intelligence
Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling reward-free search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Diva, which combines 62 tools, proves 19.2%. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 34% shorter proofs 29% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 30.3% of the theorems, while CoqHammer alone proves 26.6%. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.
arXiv.org Artificial Intelligence
Sep-12-2024
- Country:
- South America > Chile
- Oceania > Australia
- Victoria > Melbourne (0.04)
- New South Wales > Sydney (0.04)
- North America
- United States
- Maryland > Baltimore (0.04)
- Texas (0.04)
- Illinois > Champaign County
- South Carolina > Charleston County
- Charleston (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Florida > Pinellas County
- St. Petersburg (0.04)
- Washington > King County
- Alaska > Anchorage Municipality
- Anchorage (0.04)
- Massachusetts
- Hampshire County > Amherst (0.14)
- Suffolk County > Boston (0.04)
- California
- New York > New York County
- New York City (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- France (0.04)
- Italy (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Greece > Attica
- Athens (0.04)
- Germany > Saarland
- Saarbrücken (0.04)
- Asia
- Russia (0.04)
- Middle East > Jordan (0.04)
- South Korea > Seoul
- Seoul (0.04)
- China > Beijing
- Beijing (0.04)
- Africa > Senegal
- Dakar Region > Dakar (0.04)
- Genre:
- Research Report
- Experimental Study (0.46)
- Promising Solution (0.34)
- Research Report
- Industry:
- Information Technology (0.46)
- Technology: