A Symbolic Framework for Systematic Evaluation of Mathematical Reasoning with Transformers
Meadows, Jordan, Valentino, Marco, Teney, Damien, Freitas, Andre
–arXiv.org Artificial Intelligence
Whether Transformers can learn to apply symbolic rules and generalise to out-of-distribution examples is an open research question. In this paper, we devise a data generation method for producing intricate mathematical derivations, and systematically perturb them with respect to syntax, structure, and semantics. Our task-agnostic approach generates equations, annotations, and inter-equation dependencies, employing symbolic algebra for scalable data production and augmentation. We then instantiate a general experimental framework on next-equation prediction, assessing systematic mathematical reasoning and generalisation of Transformer encoders on a total of 200K examples. The experiments reveal that perturbations heavily affect performance and can reduce F1 scores of $97\%$ to below $17\%$, suggesting that inference is dominated by surface-level patterns unrelated to a deeper understanding of mathematical operators. These findings underscore the importance of rigorous, large-scale evaluation frameworks for revealing fundamental limitations of existing models.
arXiv.org Artificial Intelligence
May-21-2023
- Country:
- Europe (0.46)
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Technology: