Feurer, Matthias
Position: A Call to Action for a Human-Centered AutoML Paradigm
Lindauer, Marius, Karl, Florian, Klier, Anne, Moosbauer, Julia, Tornede, Alexander, Mueller, Andreas, Hutter, Frank, Feurer, Matthias, Bischl, Bernd
Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive performance. This focused progress, while substantial, raises questions about how well AutoML has met its broader, original goals. In this position paper, we argue that a key to unlocking AutoML's full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems, including their diverse roles, expectations, and expertise. We envision a more human-centered approach in future AutoML research, promoting the collaborative design of ML systems that tightly integrates the complementary strengths of human expertise and AutoML methodologies.
Position: Why We Must Rethink Empirical Research in Machine Learning
Herrmann, Moritz, Lange, F. Julian D., Eggensperger, Katharina, Casalicchio, Giuseppe, Wever, Marcel, Feurer, Matthias, Rügamer, David, Hüllermeier, Eyke, Boulesteix, Anne-Laure, Bischl, Bernd
In practice, that leads to non-replicable results, makes it may jeopardize applied empirical researchers' confidence findings unreliable, and threatens to undermine in experimental results and discourage them from applying progress in the field. To overcome this alarming ML methods, even though these novel approaches might be situation, we call for more awareness of the beneficial. For example, ML is increasingly being used in plurality of ways of gaining knowledge experimentally the medical domain, and this is often promising in terms of but also of some epistemic limitations.
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization
Nagler, Thomas, Schneider, Lennart, Bischl, Bernd, Feurer, Matthias
Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model's generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.
Interpretable Machine Learning for TabPFN
Rundel, David, Kobialka, Julius, von Crailsheim, Constantin, Feurer, Matthias, Nagler, Thomas, Rügamer, David
The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes. The TabPFN model, a special case of PFNs for tabular data, is able to achieve state-of-the-art performance on a variety of classification tasks while producing posterior predictive distributions in mere seconds by in-context learning without the need for learning parameters or hyperparameter tuning. This makes TabPFN a very attractive option for a wide range of domain applications. However, a major drawback of the method is its lack of interpretability. Therefore, we propose several adaptations of popular interpretability methods that we specifically design for TabPFN. By taking advantage of the unique properties of the model, our adaptations allow for more efficient computations than existing implementations. In particular, we show how in-context learning facilitates the estimation of Shapley values by avoiding approximate retraining and enables the use of Leave-One-Covariate-Out (LOCO) even when working with large-scale Transformers. In addition, we demonstrate how data valuation methods can be used to address scalability challenges of TabPFN.
PFNs4BO: In-Context Learning for Bayesian Optimization
Müller, Samuel, Feurer, Matthias, Hollmann, Noah, Hutter, Frank
In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). PFNs are neural processes that are trained to approximate the posterior predictive distribution (PPD) through in-context learning on any prior distribution that can be efficiently sampled from. We describe how this flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network (BNN). In addition, we show how to incorporate further information into the prior, such as allowing hints about the position of optima (user priors), ignoring irrelevant dimensions, and performing non-myopic BO by learning the acquisition function. The flexibility underlying these extensions opens up vast possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish code alongside trained models at github.com/automl/PFNs4BO.
Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML
Weerts, Hilde, Pfisterer, Florian, Feurer, Matthias, Eggensperger, Katharina, Bergman, Edward, Awad, Noor, Vanschoren, Joaquin, Pechenizkiy, Mykola, Bischl, Bernd, Hutter, Frank
The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to propose AutoML systems that jointly optimize fairness and predictive performance to mitigate fairness-related harm. However, fairness is a complex and inherently interdisciplinary subject, and solely posing it as an optimization problem can have adverse side effects. With this work, we aim to raise awareness among developers of AutoML systems about such limitations of fairness-aware AutoML, while also calling attention to the potential of AutoML as a tool for fairness research. We present a comprehensive overview of different ways in which fairness-related harm can arise and the ensuing implications for the design of fairness-aware AutoML. We conclude that while fairness cannot be automated, fairness-aware AutoML can play an important role in the toolbox of an ML practitioner. We highlight several open technical challenges for future work in this direction. Additionally, we advocate for the creation of more user-centered assistive systems designed to tackle challenges encountered in fairness work.
Mind the Gap: Measuring Generalization Performance Across Multiple Objectives
Feurer, Matthias, Eggensperger, Katharina, Bergman, Edward, Pfisterer, Florian, Bischl, Bernd, Hutter, Frank
Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from the validation to the test set. However, some of the models might no longer be Pareto-optimal which makes it unclear how to quantify the performance of the MHPO method when evaluated on the test set. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and studying its capabilities for comparing two optimization experiments.
SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
Lindauer, Marius, Eggensperger, Katharina, Feurer, Matthias, Biedenkapp, André, Deng, Difan, Benjamins, Carolin, Sass, René, Hutter, Frank
Algorithm parameters, in particular hyperparameters of machine learning algorithms, can substantially impact their performance. To support users in determining well-performing hyperparameter configurations for their algorithms, datasets and applications at hand, SMAC3 offers a robust and flexible framework for Bayesian Optimization, which can improve performance within a few evaluations. It offers several facades and pre-sets for typical use cases, such as optimizing hyperparameters, solving low dimensional continuous (artificial) global optimization problems and configuring algorithms to perform well across multiple problem instances. The SMAC3 package is available under a permissive BSD-license at https://github.com/automl/SMAC3.
Squirrel: A Switching Hyperparameter Optimizer
Awad, Noor, Shala, Gresa, Deng, Difan, Mallik, Neeratyoy, Feurer, Matthias, Eggensperger, Katharina, Biedenkapp, Andre', Vermetten, Diederick, Wang, Hao, Doerr, Carola, Lindauer, Marius, Hutter, Frank
In this short note, we describe our submission to the NeurIPS 2020 BBO challenge. Motivated by the fact that different optimizers work well on different problems, our approach switches between different optimizers. Since the team names on the competition's leaderboard were randomly generated "alliteration nicknames", consisting of an adjective and an animal with the same initial letter, we called our approach the Switching Squirrel, or here, short, Squirrel. The challenge mandated to suggest 16 successive batches of 8 hyperparameter configurations at a time. We chose to only use one optimizer for a given batch, warmstarted with all previous observations.
Auto-Sklearn 2.0: The Next Generation
Feurer, Matthias, Eggensperger, Katharina, Falkner, Stefan, Lindauer, Marius, Hutter, Frank
Automated Machine Learning, which supports practitioners and researchers with the tedious task of manually designing machine learning pipelines, has recently achieved substantial success. In this paper we introduce new Automated Machine Learning (AutoML) techniques motivated by our winning submission to the second ChaLearn AutoML challenge, PoSH Auto-sklearn. For this, we extend Auto-sklearn with a new, simpler meta-learning technique, improve its way of handling iterative algorithms and enhance it with a successful bandit strategy for budget allocation. Furthermore, we go one step further and study the design space of AutoML itself and propose a solution towards truly hand-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn (2.0). We verify the improvement by these additions in a large experimental study on 39 AutoML benchmark datasets and conclude the paper by comparing to Auto-sklearn (1.0), reducing the regret by up to a factor of five.