Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery

Jun-23-2022, 16:38:40 GMT–#artificialintelligence

This paper revisits datasets and evaluation criteria for Symbolic Regression, a task of expressing given data using mathematical equations, specifically focused on its potential for scientific discovery. Focused on a set of formulas used in the existing datasets based on Feynman Lectures on Physics, we recreate 120 datasets to discuss the performance of symbolic regression for scientific discovery (SRSD). For each of the 120 SRSD datasets, we carefully review the properties of the formula and its variables to design reasonably realistic sampling range of values so that our new SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method con (re)discover physical laws from such datasets. As an evaluation metric, we also propose to use normalized edit distances between a predicted equation and the ground-truth equation trees. While existing metrics are either binary or errors between the target values and an SR model's predicted values for a given input, normalized edit distances evaluate a sort of similarity between the ground-truth and predicted equation trees.

dataset, rethinking symbolic regression dataset, symbolic regression dataset and benchmark, (6 more...)

#artificialintelligence

Jun-23-2022, 16:38:40 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found