Multilingual Mathematical Autoformalization
Jiang, Albert Q., Li, Wenda, Jamnik, Mateja
–arXiv.org Artificial Intelligence
Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence. Existing methods tend to circumvent this challenge by manually curating small corpora or using few-shot learning with large language models. But these methods suffer from data scarcity and formal language acquisition difficulty. In this work, we create $\texttt{MMA}$, a large, flexible, multilingual, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. Experiments show that language models fine-tuned on $\texttt{MMA}$ produce $16-18\%$ of statements acceptable with minimal corrections on the $\texttt{miniF2F}$ and $\texttt{ProofNet}$ benchmarks, up from $0\%$ with the base model. We demonstrate that fine-tuning on multilingual formal data results in more capable autoformalization models even when deployed on monolingual tasks.
arXiv.org Artificial Intelligence
Nov-9-2023
- Country:
- North America
- Europe
- Germany > Berlin (0.04)
- Ireland (0.04)
- Austria (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Asia > Japan
- Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Genre:
- Instructional Material (0.46)
- Research Report (0.40)
- Technology: