autogluon
TabArena: A Living Benchmark for Machine Learning on Tabular Data
Erickson, Nick, Purucker, Lennart, Tschalzev, Andrej, Holzmüller, David, Desai, Prateek Mutalik, Salinas, David, Hutter, Frank
With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular benchmarking system. To launch TabArena, we manually curate a representative collection of datasets and well-implemented models, conduct a large-scale benchmarking study to initialize a public leaderboard, and assemble a team of experienced maintainers. Our results highlight the influence of validation method and ensembling of hyperparameter configurations to benchmark models at their full potential. While gradient-boosted trees are still strong contenders on practical tabular datasets, we observe that deep learning methods have caught up under larger time budgets with ensembling. At the same time, foundation models excel on smaller datasets. Finally, we show that ensembles across models advance the state-of-the-art in tabular machine learning. We observe that some deep learning models are overrepresented in cross-model ensembles due to validation set overfitting, and we encourage model developers to address this issue. We launch TabArena with a public leaderboard, reproducible code, and maintenance protocols to create a living benchmark available at https://tabarena.ai.
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Asia > Malaysia > Pahang > Kuantan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Banking & Finance (1.00)
- (2 more...)
Quickly Tuning Foundation Models for Image Segmentation
Das, Breenda, Purucker, Lennart, Carstensen, Timur, Hutter, Frank
Foundation models like SAM (Segment Anything Model) exhibit strong zero-shot image segmentation performance, but often fall short on domain-specific tasks. Fine-tuning these models typically requires significant manual effort and domain expertise. In this work, we introduce QTT-SEG, a meta-learning-driven approach for automating and accelerating the fine-tuning of SAM for image segmentation. Built on the Quick-Tune hyperparameter optimization framework, QTT-SEG predicts high-performing configurations using meta-learned cost and performance models, efficiently navigating a search space of over 200 million possibilities. We evaluate QTT-SEG on eight binary and five multiclass segmentation datasets under tight time constraints. Our results show that QTT-SEG consistently improves upon SAM's zero-shot performance and surpasses AutoGluon Multimodal, a strong AutoML baseline, on most binary tasks within three minutes. On multiclass datasets, QTT-SEG delivers consistent gains as well. These findings highlight the promise of meta-learning in automating model adaptation for specialized segmentation tasks. Code available at: https://github.com/ds-brx/QTT-SEG/
- North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
- North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Enhancing Transformer-Based Foundation Models for Time Series Forecasting via Bagging, Boosting and Statistical Ensembles
Time series foundation models (TSFMs) such as Lag-Llama, TimeGPT, Chronos, MOMENT, UniTS, and TimesFM have shown strong generalization and zero-shot capabilities for time series forecasting, anomaly detection, classification, and imputation. Despite these advantages, their predictions still suffer from variance, domain-specific bias, and limited uncertainty quantification when deployed on real operational data. This paper investigates a suite of statistical and ensemble-based enhancement techniques, including bootstrap-based bagging, regression-based stacking, prediction interval construction, statistical residual modeling, and iterative error feedback, to improve robustness and accuracy. Using the Belgium Electricity Short-Term Load Forecasting dataset as a case study, we demonstrate that the proposed hybrids consistently outperform standalone foundation models across multiple horizons. Regression-based ensembles achieve the lowest mean squared error; bootstrap aggregation markedly reduces long-context errors; residual modeling corrects systematic bias; and the resulting prediction intervals achieve near nominal coverage with widths shrinking as context length increases. The results indicate that integrating statistical reasoning with modern foundation models yields measurable gains in accuracy, reliability, and interpretability for real-world time series applications.
- Europe > Belgium (0.27)
- North America > United States > Arizona (0.05)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
AutoML Benchmark with shorter time constraints and early stopping
Jurado, Israel Campero, Gijsbers, Pieter, Vanschoren, Joaquin
Automated Machine Learning (AutoML) automatically builds machine learning (ML) models on data. The de facto standard for evaluating new AutoML frameworks for tabular data is the AutoML Benchmark (AMLB). AMLB proposed to evaluate AutoML frameworks using 1-and 4-hour time budgets across 104 tasks. We argue that shorter time constraints should be considered for the benchmark because of their practical value, such as when models need to be retrained with high frequency, and to make AMLB more accessible. This work considers two ways in which to reduce the overall computation used in the benchmark: smaller time constraints and the use of early stopping. We conduct evaluations of 11 AutoML frameworks on 104 tasks with different time constraints and find the relative ranking of AutoML frameworks is fairly consistent across time constraints, but that using early-stopping leads to a greater variety in model performance. In Machine Learning (ML), manually creating good models is time-consuming and knowledge-intensive. Automated Machine Learning (AutoML) employs efficient automated search methods to create models for new data, often reducing the computational costs in the process Hutter et al. (2019); Hollmann et al. (2022). The AutoML Benchmark (AMLB, Gijsbers et al. 2024) has become the standard for the evaluation of AutoML frameworks on tabular data, greatly increasing reproducibility and comparability in AutoML research. We identified that the time budgets proposed by Gijsbers et al. (2024) were based on what seemed "practically reasonable" at the time, as signified by many frameworks' default time budget of one hour. While the authors motivate evaluating methods on two time budgets as a proxy for anytime performance, they do not motivate the particular choice of 1 hour and 4 hours. AutoML frameworks behave under different time constraints. We conduct similar experiments and analyses for frameworks with early-stopping, offering insights into its potential to reduce energy consumption in AutoML systems. However, we often see that the original benchmarking suite or time constraints (1 hour and 4 hour) are not used as proposed.
- Europe > Netherlands > North Brabant > Eindhoven (0.05)
- Asia > Middle East > Israel (0.04)
- North America > Montserrat (0.04)
- Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
COPA: Comparing the Incomparable to Explore the Pareto Front
Javaloy, Adrián, Vergari, Antonio, Valera, Isabel
In machine learning (ML), it is common to account for multiple objectives when, e.g., selecting a model to deploy. However, it is often unclear how one should compare, aggregate and, ultimately, trade-off these objectives, as they might be measured in different units or scales. For example, when deploying large language models (LLMs), we might not only care about their performance, but also their CO2 consumption. In this work, we investigate how objectives can be sensibly compared and aggregated to navigate their Pareto front. To do so, we propose to make incomparable objectives comparable via their CDFs, approximated by their relative rankings. This allows us to aggregate them while matching user-specific preferences, allowing practitioners to meaningfully navigate and search for models in the Pareto front. We demonstrate the potential impact of our methodology in diverse areas such as LLM selection, domain generalization, and AutoML benchmarking, where classical ways to aggregate and normalize objectives fail.
- Europe > Austria > Vienna (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (12 more...)
Tree Species Classification using Machine Learning and 3D Tomographic SAR -- a case study in Northern Europe
Grace, Colverd, Laura, Schade, Jumpei, Takami, Karol, Bot, Joseph, Gallego
Tree species classification plays an important role in nature conservation, forest inventories, forest management, and the protection of endangered species. Over the past four decades, remote sensing technologies have been extensively utilized for tree species classification, with Synthetic Aperture Radar (SAR) emerging as a key technique. In this study, we employed TomoSense, a 3D tomographic dataset, which utilizes a stack of single-look complex (SLC) images, a byproduct of SAR, captured at different incidence angles to generate a three-dimensional representation of the terrain. Our research focuses on evaluating multiple tabular machine-learning models using the height information derived from the tomographic image intensities to classify eight distinct tree species. The SLC data and tomographic imagery were analyzed across different polarimetric configurations and geosplit configurations. We investigated the impact of these variations on classification accuracy, comparing the performance of various tabular machine-learning models and optimizing them using Bayesian optimization. Additionally, we incorporated a proxy for actual tree height using point cloud data from Light Detection and Ranging (LiDAR) to provide height statistics associated with the model's predictions. This comparison offers insights into the reliability of tomographic data in predicting tree species classification based on height.
- Europe > Northern Europe (0.40)
- Europe > Germany > North Rhine-Westphalia (0.14)
UniAutoML: A Human-Centered Framework for Unified Discriminative and Generative AutoML with Large Language Models
Guo, Jiayi, Chen, Zan, Ji, Yingrui, Zhang, Liyun, Luo, Daqin, Li, Zhigang, Shen, Yiqin
Automated Machine Learning (AutoML) has simplified complex ML processes such as data pre-processing, model selection, and hyper-parameter searching. However, traditional AutoML frameworks focus solely on discriminative tasks, often falling short in tackling AutoML for generative models. Additionally, these frameworks lack interpretability and user engagement during the training process, primarily due to the absence of human-centered design. It leads to a lack of transparency in final decision-making and limited user control, potentially reducing trust and adoption of AutoML methods. To address these limitations, we introduce UniAutoML, a human-centered AutoML framework that leverages Large Language Models (LLMs) to unify AutoML for both discriminative (e.g., Transformers and CNNs for classification or regression tasks) and generative tasks (e.g., fine-tuning diffusion models or LLMs). The human-centered design of UniAutoML innovatively features a conversational user interface (CUI) that facilitates natural language interactions, providing users with real-time guidance, feedback, and progress updates for better interpretability. This design enhances transparency and user control throughout the AutoML training process, allowing users to seamlessly break down or modify the model being trained. To mitigate potential risks associated with LLM generated content, UniAutoML incorporates a safety guardline that filters inputs and censors outputs. We evaluated UniAutoML's performance and usability through experiments on eight diverse datasets and user studies involving 25 participants, demonstrating that UniAutoML not only enhances performance but also improves user control and trust. Our human-centered design bridges the gap between AutoML capabilities and user understanding, making ML more accessible to a broader audience.
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Information Technology (0.67)
- Health & Medicine (0.46)
PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization
Spinaci, Marco, Polewczyk, Marek, Hoffart, Johannes, Kohler, Markus C., Thelin, Sam, Klein, Tassilo
Self-supervised learning on tabular data seeks to apply advances from natural language and image domains to the diverse domain of tables. However, current techniques often struggle with integrating multi-domain data and require data cleaning or specific structural requirements, limiting the scalability of pre-training datasets. We introduce PORTAL (Pretraining One-Row-at-a-Time for All tabLes), a framework that handles various data modalities without the need for cleaning or preprocessing. This simple yet powerful approach can be effectively pre-trained on online-collected datasets and fine-tuned to match state-of-the-art methods on complex classification and regression tasks. This work offers a practical advancement in self-supervised learning for large-scale tabular data.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France (0.04)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models
Luo, Daqin, Feng, Chengjian, Nong, Yuxuan, Shen, Yiqing
Automated Machine Learning (AutoML) offers a promising approach to streamline the training of machine learning models. However, existing AutoML frameworks are often limited to unimodal scenarios and require extensive manual configuration. Recent advancements in Large Language Models (LLMs) have showcased their exceptional abilities in reasoning, interaction, and code generation, presenting an opportunity to develop a more automated and user-friendly framework. To this end, we introduce AutoM3L, an innovative Automated Multimodal Machine Learning framework that leverages LLMs as controllers to automatically construct multimodal training pipelines. AutoM3L comprehends data modalities and selects appropriate models based on user requirements, providing automation and interactivity. By eliminating the need for manual feature engineering and hyperparameter optimization, our framework simplifies user engagement and enables customization through directives, addressing the limitations of previous rule-based AutoML approaches. We evaluate the performance of AutoM3L on six diverse multimodal datasets spanning classification, regression, and retrieval tasks, as well as a comprehensive set of unimodal datasets. The results demonstrate that AutoM3L achieves competitive or superior performance compared to traditional rule-based AutoML methods. Furthermore, a user study highlights the user-friendliness and usability of our framework, compared to the rule-based AutoML methods.
- Oceania > Australia > Victoria > Melbourne (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.93)
Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications
Many industry verticals are confronted with small-sized tabular data. In this lowdata regime, it is currently unclear whether the best performance can be expected from simple baselines, or more complex machine learning approaches that leverage meta-learning and ensembling. On 44 tabular classification datasets with sample sizes 500, we find that L2-regularized logistic regression performs similar to state-of-the-art automated machine learning (AutoML) frameworks (AutoPrognosis, AutoGluon) and off-the-shelf deep neural networks (TabPFN, HyperFast) on the majority of the benchmark datasets. We therefore recommend to consider logistic regression as the first choice for data-scarce applications with tabular data and provide practitioners with best practices for further method selection. Machine learning algorithms thrive on data (Banko & Brill, 2001; Halevy et al., 2009).
- Research Report > New Finding (0.75)
- Research Report > Experimental Study (0.61)