A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data

May-27-2025, 12:34:01 GMT–Neural Information Processing Systems

Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing performance differences typically have model-centered evaluation setups with overly standardized data preprocessing. This limits the external validity of these studies, as in real-world modeling pipelines, models are typically applied after dataset-specific preprocessing and feature engineering. We address this gap by proposing a data-centric evaluation framework. We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset.

data-centric perspective, feature engineering, tabular data, (3 more...)

Neural Information Processing Systems

May-27-2025, 12:34:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)