Dynamic Design of Machine Learning Pipelines via Metalearning

Alcobaça, Edesio, de Carvalho, André C. P. L. F.

arXiv.org Artificial Intelligence 

Automated Machine Learning (AutoML) has become an essential tool for democratizing machine learning (ML) by automating key aspects of model selection, hyperparameter tuning, and feature engineering [1, 2]. However, the efficiency of AutoML frameworks remains a significant challenge, as the search for optimal configurations is often computationally expensive [3-5]. Traditional search strategies, such as Random Search (RS) and Bayesian Optimization (BO), indiscriminately explore large search spaces, resulting in high resource consumption [3, 6, 7]. To address this challenge, we propose a metalearning approach that dynamically designs search spaces for an AutoML solution, reducing computational costs while maintaining competitive predictive performance. The proposed method leverages historical metaknowledge to identify and prioritize promising regions of the search space, enabling more efficient optimization. By predicting the performance of preprocessor-classifier combinations, a meta-model, induced using metalearning, can provide a warm-start advantage, accelerating the AutoML search process. This study evaluates the effectiveness of the proposed approach through an extensive set of experiments, analyzing both computational efficiency and predictive performance. According to the experimental results, the dynamically generated search spaces significantly reduce runtime, while maintaining high-quality solutions. In particular, the RS-mtl-95 configuration achieved an 89% reduction in runtime without compromising predictive performance.