Analyzing Generalization in Pre-Trained Symbolic Regression

Voigt, Henrik, Kahlmeyer, Paul, Lawonn, Kai, Habeck, Michael, Giesen, Joachim

Sep-25-2025–arXiv.org Artificial Intelligence

Symbolic regression algorithms search a space of mathematical expressions for formulas that explain given data. Transformer-based models have emerged as a promising, scalable approach shifting the expensive combinatorial search to a large-scale pre-training phase. However, the success of these models is critically dependent on their pre-training data. Their ability to generalize to problems outside of this pre-training distribution remains largely unexplored. In this work, we conduct a systematic empirical study to evaluate the generalization capabilities of pre-trained, transformer-based symbolic regression. We rigorously test performance both within the pre-training distribution and on a series of out-of-distribution challenges for several state of the art approaches. Our findings reveal a significant dichotomy: while pre-trained models perform well in-distribution, the performance consistently degrades in out-of-distribution scenarios. We conclude that this generalization gap is a critical barrier for practitioners, as it severely limits the practical use of pre-trained approaches for real-world applications.

large language model, machine learning, symbolic regression, (19 more...)

arXiv.org Artificial Intelligence

Sep-25-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report > New Finding (0.88)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.72)
  - Natural Language > Large Language Model (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found