Appendix A Dataset Details
–Neural Information Processing Systems
We evaluate TPSR and several baseline methods on the following four standard benchmark datasets: Feynman, Black-box, and Strogatz from SRBench [42], and In-domain Synthetic Data generated based on [18]. More details on each of these datasets are given below. The regression input points (x, y) from these equations are provided in Penn Machine Learning Benchmark (PMLB) [42, 43] and have been studied in SRBench [42] for the symbolic regression task. The input dimension is limited to d 10 and the true underlying function of points is known. We split the dataset into B bags of 200 input points (when N is larger than 200) since the transformer SR model is pretrained on N 200 input points as per [18]. The input points for these problems are included in PMLB [43] and have been examined in SRBench [42] for symbolic regression. The input dimension for these problems is restricted to d = 2 and the true underlying functions are provided. The aim of SR study on these black-box datasets is to find an interpretable model expression that fits the data effectively.
Neural Information Processing Systems
Mar-27-2025, 12:09:15 GMT