Autoformalizer with Tool Feedback

Guo, Qi, Wang, Jianing, Zhang, Jianfei, Kong, Deyang, Huang, Xiangzhou, Xi, Xiangyu, Wang, Wei, Wang, Jingang, Cai, Xunliang, Zhang, Shikun, Ye, Wei

arXiv.org Artificial Intelligence 

Autoformalization addresses the scarcity of data for Automated Theorem Proving (A TP) by translating mathematical problems from natural language into formal statements. However, existing formalizer still struggles to consistently generate valid statements that meet syntactic validity and semantic consistency. To address this issue, we propose the Autoformalizer with Tool Feedback (A TF), a novel approach that incorporates syntactic and consistency information as tools into the formalization process. By integrating Lean 4 compilers for syntax corrections and employing a multi-LLMs-as-judge approach for consistency validation, the model is able to adaptively refine generated statements according to the tool feedback, enhancing both syntactic validity and semantic consistency. The training of A TF involves a cold-start phase on synthetic tool-calling data, an expert iteration phase to improve formalization capabilities, and Direct Preference Optimization to alleviate ineffective revisions. Experimental results show that A TF markedly outperforms a range of baseline formalizer models, with its superior performance further validated by human evaluations. Subsequent analysis reveals that A TF demonstrates excellent inference scaling properties. Recent advancements in the reasoning capabilities of large language models have significantly accelerated progress in the field of Automated Theorem Proving (A TP) (Y ang et al., 2024). Unlike traditional mathematical tasks, A TP requires models to start from a formalized theorem statement and construct rigorous logical proofs that can be verified within formal languages such as Lean (De Moura et al., 2015) and Isabelle (Paulson, 1994). However, the training of recent massive provers, such as DeepSeek-Prover (Ren et al., 2025) and Kimina-Prover (Wang et al., 2025), is hindered by the scarcity of formalized mathematical queries. Autoformalization addresses this by translating mathematical problems expressed in natural language into verifiable formal statements. A significant challenge in autoformalization is the absence of a universal automatic evaluation standard (Li et al., 2024b).