Conjecturing: An Overlooked Step in Formal Mathematical Reasoning

Sivakumar, Jasivan Alex, Borchert, Philipp, Cardenas, Ronald, Lampouras, Gerasimos

Oct-15-2025–arXiv.org Artificial Intelligence

Autoformalisation, the task of expressing informal mathematical statements in formal language, is often viewed as a direct translation process. Many mathematical problems cannot be formalised directly without first conjecturing a conclusion such as an explicit answer, or a specific bound. Since Large Language Models (LLMs) already struggle with autoformalisation, and the evaluation of their conjecturing ability is limited and often entangled within autoformalisation or proof, it is particularly challenging to understand its effect. To address this gap, we augment existing datasets to create ConjectureBench, and redesign the evaluation framework and metric specifically to measure the conjecturing capabilities of LLMs both as a distinct task and within the autoformalisation pipeline. However, the conjecture should not be assumed to be provided. We demonstrate that while LLMs possess the requisite knowledge to generate accurate conjectures, improving autoformali-sation performance requires treating conjecturing as an independent task, and investigating further how to correctly integrate it within autoformalisation. Finally, we provide forward-looking guidance to steer future research toward improving conjecturing, an overlooked step of formal mathematical reasoning. Natural language reasoning with Large Language Models (LLMs) has emerged as a powerful tool for solving complex mathematical problems. Its effectiveness is highlighted by recent breakthroughs, such as AI systems from OpenAI and Google solving five of six problems from the 2025 International Mathematics Olympiad (IMO) using natural language (Metz, 2025). The critical caveat is that these informal solutions require validation by expert mathematicians, a process that is prone to human error and lack scalability (Gou ezel & Shchur, 2019). Proof assistants like Isabelle (Wenzel et al., 2008) and Lean (Moura & Ullrich, 2021) provide a path toward automated verification at scale through formal reasoning.

conjecture, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Oct-15-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report (0.83)

Industry:
- Leisure & Entertainment > Sports (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found