Goto

Collaborating Authors

 auto-sklearn


Dynamic Design of Machine Learning Pipelines via Metalearning

Alcobaça, Edesio, de Carvalho, André C. P. L. F.

arXiv.org Artificial Intelligence

Automated Machine Learning (AutoML) has become an essential tool for democratizing machine learning (ML) by automating key aspects of model selection, hyperparameter tuning, and feature engineering [1, 2]. However, the efficiency of AutoML frameworks remains a significant challenge, as the search for optimal configurations is often computationally expensive [3-5]. Traditional search strategies, such as Random Search (RS) and Bayesian Optimization (BO), indiscriminately explore large search spaces, resulting in high resource consumption [3, 6, 7]. To address this challenge, we propose a metalearning approach that dynamically designs search spaces for an AutoML solution, reducing computational costs while maintaining competitive predictive performance. The proposed method leverages historical metaknowledge to identify and prioritize promising regions of the search space, enabling more efficient optimization. By predicting the performance of preprocessor-classifier combinations, a meta-model, induced using metalearning, can provide a warm-start advantage, accelerating the AutoML search process. This study evaluates the effectiveness of the proposed approach through an extensive set of experiments, analyzing both computational efficiency and predictive performance. According to the experimental results, the dynamically generated search spaces significantly reduce runtime, while maintaining high-quality solutions. In particular, the RS-mtl-95 configuration achieved an 89% reduction in runtime without compromising predictive performance.


Grammar-based evolutionary approach for automated workflow composition with domain-specific operators and ensemble diversity

Barbudo, Rafael, Ramírez, Aurora, Romero, José Raúl

arXiv.org Artificial Intelligence

The process of extracting valuable and novel insights from raw data involves a series of complex steps. In the realm of Automated Machine Learning (AutoML), a significant research focus is on automating aspects of this process, specifically tasks like selecting algorithms and optimising their hyper-parameters. A particularly challenging task in AutoML is automatic workflow composition (AWC). AWC aims to identify the most effective sequence of data preprocessing and ML algorithms, coupled with their best hyper-parameters, for a specific dataset. However, existing AWC methods are limited in how many and in what ways they can combine algorithms within a workflow. Addressing this gap, this paper introduces EvoFlow, a grammar-based evolutionary approach for AWC. EvoFlow enhances the flexibility in designing workflow structures, empowering practitioners to select algorithms that best fit their specific requirements. EvoFlow stands out by integrating two innovative features. First, it employs a suite of genetic operators, designed specifically for AWC, to optimise both the structure of workflows and their hyper-parameters. Second, it implements a novel updating mechanism that enriches the variety of predictions made by different workflows. Promoting this diversity helps prevent the algorithm from overfitting. With this aim, EvoFlow builds an ensemble whose workflows differ in their misclassified instances. To evaluate EvoFlow's effectiveness, we carried out empirical validation using a set of classification benchmarks. We begin with an ablation study to demonstrate the enhanced performance attributable to EvoFlow's unique components. Then, we compare EvoFlow with other AWC approaches, encompassing both evolutionary and non-evolutionary techniques. Our findings show that EvoFlow's specialised genetic operators and updating mechanism substantially outperform current leading methods[..]


Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML

Nguyen, Giang, Biswas, Sumon, Rajan, Hridesh

arXiv.org Artificial Intelligence

Machine learning (ML) is increasingly being used in critical decision-making software, but incidents have raised questions about the fairness of ML predictions. To address this issue, new tools and methods are needed to mitigate bias in ML-based software. Previous studies have proposed bias mitigation algorithms that only work in specific situations and often result in a loss of accuracy. Our proposed solution is a novel approach that utilizes automated machine learning (AutoML) techniques to mitigate bias. Our approach includes two key innovations: a novel optimization function and a fairness-aware search space. By improving the default optimization function of AutoML and incorporating fairness objectives, we are able to mitigate bias with little to no loss of accuracy. Additionally, we propose a fairness-aware search space pruning method for AutoML to reduce computational cost and repair time. Our approach, built on the state-of-the-art Auto-Sklearn tool, is designed to reduce bias in real-world scenarios. In order to demonstrate the effectiveness of our approach, we evaluated our approach on four fairness problems and 16 different ML models, and our results show a significant improvement over the baseline and existing bias mitigation techniques. Our approach, Fair-AutoML, successfully repaired 60 out of 64 buggy cases, while existing bias mitigation techniques only repaired up to 44 out of 64 cases.


Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML

Purucker, Lennart, Schneider, Lennart, Anastacio, Marie, Beel, Joeran, Bischl, Bernd, Hoos, Holger

arXiv.org Artificial Intelligence

Automated machine learning (AutoML) systems commonly ensemble models post hoc to improve predictive performance, typically via greedy ensemble selection (GES). However, we believe that GES may not always be optimal, as it performs a simple deterministic greedy search. In this work, we introduce two novel population-based ensemble selection methods, QO-ES and QDO-ES, and compare them to GES. While QO-ES optimises solely for predictive performance, QDO-ES also considers the diversity of ensembles within the population, maintaining a diverse set of well-performing ensembles during optimisation based on ideas of quality diversity optimisation. The methods are evaluated using 71 classification datasets from the AutoML benchmark, demonstrating that QO-ES and QDO-ES often outrank GES, albeit only statistically significant on validation data. Our results further suggest that diversity can be beneficial for post hoc ensembling but also increases the risk of overfitting.


Auto-Sklearn: How To Boost Performance and Efficiency Through Automated Machine Learning

#artificialintelligence

Many of us are familiar with the challenge of selecting a suitable machine learning model for a specific prediction task, given the vast number of models to choose from. On top of that, we also need to find optimal hyperparameters in order to maximize our model's performance. These challenges can largely be overcome through automated machine learning, or AutoML. I say largely because, despite its name, the process is not fully automated and still requires some manual tweaking and decision-making by the user. Essentially, AutoML frees the user from the daunting and time-consuming tasks of data preprocessing, model selection, hyperparameter optimization, and ensemble building.


AutoEn: An AutoML method based on ensembles of predefined Machine Learning pipelines for supervised Traffic Forecasting

Angarita-Zapata, Juan S., Masegosa, Antonio D., Triguero, Isaac

arXiv.org Artificial Intelligence

Intelligent Transportation Systems are producing tons of hardly manageable traffic data, which motivates the use of Machine Learning (ML) for data-driven applications, such as Traffic Forecasting (TF). TF is gaining relevance due to its ability to mitigate traffic congestion by forecasting future traffic states. However, TF poses one big challenge to the ML paradigm, known as the Model Selection Problem (MSP): deciding the most suitable combination of data preprocessing techniques and ML method for traffic data collected under different transportation circumstances. In this context, Automated Machine Learning (AutoML), the automation of the ML workflow from data preprocessing to model validation, arises as a promising strategy to deal with the MSP in problem domains wherein expert ML knowledge is not always an available or affordable asset, such as TF. Various AutoML frameworks have been used to approach the MSP in TF. Most are based on online optimisation processes to search for the best-performing pipeline on a given dataset. This online optimisation could be complemented with meta-learning to warm-start the search phase and/or the construction of ensembles using pipelines derived from the optimisation process. However, given the complexity of the search space and the high computational cost of tuning-evaluating pipelines generated, online optimisation is only beneficial when there is a long time to obtain the final model. Thus, we introduce AutoEn, which is a simple and efficient method for automatically generating multi-classifier ensembles from a predefined set of ML pipelines. We compare AutoEn against Auto-WEKA and Auto-sklearn, two AutoML methods commonly used in TF. Experimental results demonstrate that AutoEn can lead to better or more competitive results in the general-purpose domain and in TF.


Auto-sklearn: Efficient and Robust Automated Machine Learning

#artificialintelligence

The success of machine learning in a broad range of applications has led to an ever-growing demand for machine learning systems that can be used off the shelf by non-experts. To be effective in practice, such systems need to automatically choose a good algorithm and feature preprocessing steps for a new dataset at hand, and also set their respective hyperparameters. Recent work has started to tackle this automated machine learning (AutoML) problem with the help of efficient Bayesian optimization methods. Building on this, we introduce a robust new AutoML system based on the Python machine learning package scikit-learn (using 15 classifiers, 14 feature preprocessing methods, and 4 data preprocessing methods, giving rise to a structured hypothesis space with 110 hyperparameters). This system, which we dub Auto-sklearn, improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization.


A Scalable AutoML Approach Based on Graph Neural Networks

Helali, Mossad, Mansour, Essam, Abdelaziz, Ibrahim, Dolby, Julian, Srinivas, Kavitha

arXiv.org Artificial Intelligence

AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search for optimal pipelines. In this work, we present a novel meta-learning system called KGpip which, (1) builds a database of datasets and corresponding pipelines by mining thousands of scripts with program analysis, (2) uses dataset embeddings to find similar datasets in the database based on its content instead of metadata-based features, (3) models AutoML pipeline creation as a graph generation problem, to succinctly characterize the diverse pipelines seen for a single dataset. KGpip's meta-learning is a sub-component for AutoML systems. We demonstrate this by integrating KGpip with two AutoML systems. Our comprehensive evaluation using 126 datasets, including those used by the state-of-the-art systems, shows that KGpip significantly outperforms these systems.


10-best-automl-tools-used-in-data-science-projects-for-2022

#artificialintelligence

Automatic Machine Learning (AutoML), also known as AutoML services or tools, allows data scientists, machine learning engineers, and non-technical users to create scalable machinelearning models. Here's a list of the Top 10 AutoML Tools Used in Data Science Projects in 2022 AutoML tools automate this process by automatically breaking down information and selecting calculations models based on the experiences gained from information investigation. These models are created, tested, and refined using a subset the available data. Finally, the models that exhibit the best are presented to the client. AutoML TOOLS allow clients to choose between intricacy or execution.


Auto-Sklearn: Accelerate your machine learning models with AutoML

#artificialintelligence

AutoML is a relatively new and upcoming subset of machine learning. The main approach in AutoML is to limit the involvement of data scientists and let the tool handle all time-consuming processes in machine learning like data preprocessing, best algorithm selection, hyperparameter tuning, etc., thus saving time for setting up these ML models and speeding up their deployment. There are several AutoML tools available in the market these days. In one of my previous blogathon articles, I had shared a comprehensive guide to AutoML with an easy AutoGluon example. This guide included a list of several AutoML tools currently available in the market.