AutoML Benchmark with shorter time constraints and early stopping
Jurado, Israel Campero, Gijsbers, Pieter, Vanschoren, Joaquin
–arXiv.org Artificial Intelligence
Automated Machine Learning (AutoML) automatically builds machine learning (ML) models on data. The de facto standard for evaluating new AutoML frameworks for tabular data is the AutoML Benchmark (AMLB). AMLB proposed to evaluate AutoML frameworks using 1-and 4-hour time budgets across 104 tasks. We argue that shorter time constraints should be considered for the benchmark because of their practical value, such as when models need to be retrained with high frequency, and to make AMLB more accessible. This work considers two ways in which to reduce the overall computation used in the benchmark: smaller time constraints and the use of early stopping. We conduct evaluations of 11 AutoML frameworks on 104 tasks with different time constraints and find the relative ranking of AutoML frameworks is fairly consistent across time constraints, but that using early-stopping leads to a greater variety in model performance. In Machine Learning (ML), manually creating good models is time-consuming and knowledge-intensive. Automated Machine Learning (AutoML) employs efficient automated search methods to create models for new data, often reducing the computational costs in the process Hutter et al. (2019); Hollmann et al. (2022). The AutoML Benchmark (AMLB, Gijsbers et al. 2024) has become the standard for the evaluation of AutoML frameworks on tabular data, greatly increasing reproducibility and comparability in AutoML research. We identified that the time budgets proposed by Gijsbers et al. (2024) were based on what seemed "practically reasonable" at the time, as signified by many frameworks' default time budget of one hour. While the authors motivate evaluating methods on two time budgets as a proxy for anytime performance, they do not motivate the particular choice of 1 hour and 4 hours. AutoML frameworks behave under different time constraints. We conduct similar experiments and analyses for frameworks with early-stopping, offering insights into its potential to reduce energy consumption in AutoML systems. However, we often see that the original benchmarking suite or time constraints (1 hour and 4 hour) are not used as proposed.
arXiv.org Artificial Intelligence
Apr-16-2025
- Country:
- Asia > Middle East
- Israel (0.04)
- Europe
- Belgium > Flanders
- East Flanders > Ghent (0.04)
- Netherlands > North Brabant
- Eindhoven (0.05)
- Belgium > Flanders
- North America > Montserrat (0.04)
- Asia > Middle East
- Genre:
- Research Report > Experimental Study (0.68)
- Industry:
- Energy (0.34)
- Technology: