Multilingual Mathematical Autoformalization

Jiang, Albert Q., Li, Wenda, Jamnik, Mateja

Nov-9-2023–arXiv.org Artificial Intelligence

Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence. Existing methods tend to circumvent this challenge by manually curating small corpora or using few-shot learning with large language models. But these methods suffer from data scarcity and formal language acquisition difficulty. In this work, we create $\texttt{MMA}$, a large, flexible, multilingual, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. Experiments show that language models fine-tuned on $\texttt{MMA}$ produce $16-18\%$ of statements acceptable with minimal corrections on the $\texttt{miniF2F}$ and $\texttt{ProofNet}$ benchmarks, up from $0\%$ with the base model. We demonstrate that fine-tuning on multilingual formal data results in more capable autoformalization models even when deployed on monolingual tasks.

formal language, formalisation, language model, (15 more...)

arXiv.org Artificial Intelligence

Nov-9-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Santa Clara County
      - Palo Alto (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Germany > Berlin (0.04)
  - Ireland (0.04)
  - Austria (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
- Asia > Japan
  - Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:
- Instructional Material (0.46)
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Logic & Formal Reasoning (1.00)
  - Natural Language > Large Language Model (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found