A Symbolic Framework for Systematic Evaluation of Mathematical Reasoning with Transformers

Meadows, Jordan, Valentino, Marco, Teney, Damien, Freitas, Andre

May-21-2023–arXiv.org Artificial Intelligence

Whether Transformers can learn to apply symbolic rules and generalise to out-of-distribution examples is an open research question. In this paper, we devise a data generation method for producing intricate mathematical derivations, and systematically perturb them with respect to syntax, structure, and semantics. Our task-agnostic approach generates equations, annotations, and inter-equation dependencies, employing symbolic algebra for scalable data production and augmentation. We then instantiate a general experimental framework on next-equation prediction, assessing systematic mathematical reasoning and generalisation of Transformer encoders on a total of 200K examples. The experiments reveal that perturbations heavily affect performance and can reduce F1 scores of $97\%$ to below $17\%$, suggesting that inference is dominated by surface-level patterns unrelated to a deeper understanding of mathematical operators. These findings underscore the importance of rigorous, large-scale evaluation frameworks for revealing fundamental limitations of existing models.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

May-21-2023

arXiv.org PDF

Add feedback

Country:
- Europe (0.46)

Genre:
- Research Report > New Finding (0.86)

Industry:
- Health & Medicine > Therapeutic Area (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Information Retrieval (0.34)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found