From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

Cao, Jialun, Lu, Yaojie, Li, Meiziniu, Ma, Haoyang, Li, Haokun, He, Mengda, Wen, Cheng, Sun, Le, Zhang, Hongyu, Qin, Shengchao, Cheung, Shing-Chi, Tian, Cong

Feb-17-2025–arXiv.org Artificial Intelligence

The research in AI-based formal mathematical reasoning has shown an unstop- pable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five formal specification languages (Coq, Lean4, Dafny, ACSL, and TLA+) by distilling gpt-4o and evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. We also fine-tuned several 7~8B small models to achieve comparable performance with Deepseek-R1-671B. Interestingly, we observed that fine-tuning with formal data also enhances mathematics, reasoning, and coding capabilities. Fine-tuned models are released at https: //huggingface.co/fm-universe.

large language model, machine learning, specification, (18 more...)

arXiv.org Artificial Intelligence

Feb-17-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.92)
- North America > United States (0.68)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)