Salinas, David
EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks
Arbel, Michael, Salinas, David, Hutter, Frank
However, these models overlook However, row-order symmetry is not the only symmetry a crucial equivariance property: the arbitrary relevant to tabular data. Another key symmetry pertains ordering of target dimensions should not influence to feature order, where the arrangement of columns should model predictions. In this study, we identify not influence model predictions. Recent work (Mรผller et al., this oversight as a source of incompressible 2024; Hollmann et al., 2025) has addressed this challenge by error, termed the equivariance gap, which introduces employing bi-attention mechanisms similar to those studied instability in predictions. To mitigate these in earlier work (Kossen et al., 2022). This approach alternates issues, we propose a novel model designed to preserve attention over rows and columns, making the models equivariance across output dimensions. Our equivariant to feature permutations and better suited for experimental results indicate that our proposed handling another inherent symmetry of tabular data.
Tuning LLM Judge Design Decisions for 1/1000 of the Cost
Salinas, David, Swelam, Omar, Hutter, Frank
Evaluating Large Language Models (LLMs) often requires costly human annotations. To address this, LLM-based judges have been proposed, which compare the outputs of two LLMs enabling the ranking of models without human intervention. While several approaches have been proposed, many confounding factors are present between different papers. For instance the model, the prompt and other hyperparameters are typically changed at the same time making apple-to-apple comparisons challenging. In this paper, we propose to systematically analyze and tune hyperparameter of LLM judges. To alleviate the high cost of evaluating a judge, we propose to leverage multi-objective multi-fidelity which allows to find judges that trades accuracy for cost and also reduce significantly the cost of the search. Our method identifies judges that not only outperform existing benchmarks in accuracy and cost-efficiency but also utilize open-weight models, ensuring greater accessibility and reproducibility.
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features
Hoo, Shi Bin, Mรผller, Samuel, Salinas, David, Hutter, Frank
Foundation models have become popular in forecasting due to their ability to make accurate predictions, even with minimal fine-tuning on specific datasets. In this paper, we demonstrate how the newly released regression variant of TabPFN, a general tabular foundation model, can be applied to time series forecasting. We propose a straightforward approach, TabPFN-TS, which pairs TabPFN with simple feature engineering to achieve strong forecasting performance. Despite its simplicity and with only 11M parameters, TabPFN-TS outperforms Chronos-Mini, a model of similar size, and matches or even slightly outperforms Chronos-Large, which has 65-fold more parameters. A key strength of our method lies in its reliance solely on artificial data during pre-training, avoiding the need for large training datasets and eliminating the risk of benchmark contamination.
Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space Models
Bhethanabhotla, Sathya Kamesh, Swelam, Omar, Siems, Julien, Salinas, David, Hutter, Frank
This paper introduces Mamba4Cast, a zero-shot foundation model for time series forecasting. Based on the Mamba architecture and inspired by Prior-data Fitted Networks (PFNs), Mamba4Cast generalizes robustly across diverse time series tasks without the need for dataset specific fine-tuning. Mamba4Cast's key innovation lies in its ability to achieve strong zero-shot performance on real-world datasets while having much lower inference times than time series foundation models based on the transformer architecture. Trained solely on synthetic data, the model generates forecasts for entire horizons in a single pass, outpacing traditional auto-regressive approaches. Our experiments show that Mamba4Cast performs competitively against other state-of-the-art foundation models in various data sets while scaling significantly better with the prediction length.
GAMformer: In-Context Learning for Generalized Additive Models
Mueller, Andreas, Siems, Julien, Nori, Harsha, Salinas, David, Zela, Arber, Caruana, Rich, Hutter, Frank
Generalized Additive Models (GAMs) are widely recognized for their ability to create fully interpretable machine learning models for tabular data. Traditionally, training GAMs involves iterative learning algorithms, such as splines, boosted trees, or neural networks, which refine the additive components through repeated error reduction. In this paper, we introduce GAMformer, the first method to leverage in-context learning to estimate shape functions of a GAM in a single forward pass, representing a significant departure from the conventional iterative approaches to GAM fitting. Building on previous research applying in-context learning to tabular data, we exclusively use complex, synthetic data to train GAMformer, yet find it extrapolates well to real-world data. Our experiments show that GAMformer performs on par with other leading GAMs across various classification benchmarks while generating highly interpretable shape functions. The growing importance of interpretability in machine learning is evident, especially in areas where transparency, fairness, and accountability are critical (Barocas and Selbst, 2016; Rudin et al., 2022). Interpretable models are essential for building trust between humans and AI systems by allowing users to understand the reasoning behind the model's predictions and decisions (Ribeiro et al., 2016). This is crucial in safety-critical fields like healthcare, where incorrect or biased decisions can have severe consequences (Caruana et al., 2015). Additionally, interpretability is vital for regulatory compliance in sectors like finance and hiring, where explaining and justifying model outcomes is necessary (Arun et al., 2016; Dattner et al., 2019). Interpretable models also help detect and mitigate bias by revealing the factors influencing predictions, ensuring fair and unbiased decisions across different population groups (Mehrabi et al., 2021). Generalized Additive Models (GAMs) have proven a popular choice for interpretable modeling due to their high accuracy and interpretability. In GAMs, the target variable is expressed as a sum of non-linearly transformed features.
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning
Becktepe, Jannis, Dierkes, Julian, Benjamins, Carolin, Mohan, Aditya, Salinas, David, Rajan, Raghu, Hutter, Frank, Hoos, Holger, Lindauer, Marius, Eimer, Theresa
Hyperparameters are a critical factor in reliably training well-performing reinforcement learning (RL) agents. Unfortunately, developing and evaluating automated approaches for tuning such hyperparameters is both costly and time-consuming. As a result, such approaches are often only evaluated on a single domain or algorithm, making comparisons difficult and limiting insights into their generalizability. We propose ARLBench, a benchmark for hyperparameter optimization (HPO) in RL that allows comparisons of diverse HPO approaches while being highly efficient in evaluation. To enable research into HPO in RL, even in settings with low compute resources, we select a representative subset of HPO tasks spanning a variety of algorithm and environment combinations. This selection allows for generating a performance profile of an automated RL (AutoRL) method using only a fraction of the compute previously necessary, enabling a broader range of researchers to work on HPO in RL. With the extensive and large-scale dataset on hyperparameter landscapes that our selection is based on, ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL. Both the benchmark and the dataset are available at https://github.com/automl/arlbench.
Deep Non-Parametric Time Series Forecaster
Rangapuram, Syama Sundar, Gasthaus, Jan, Stella, Lorenzo, Flunkert, Valentin, Salinas, David, Wang, Yuyang, Januschowski, Tim
This paper presents non-parametric baseline models for time series forecasting. Unlike classical forecasting models, the proposed approach does not assume any parametric form for the predictive distribution and instead generates predictions by sampling from the empirical distribution according to a tunable strategy. By virtue of this, the model is always able to produce reasonable forecasts (i.e., predictions within the observed data range) without fail unlike classical models that suffer from numerical stability on some data distributions. Moreover, we develop a global version of the proposed method that automatically learns the sampling strategy by exploiting the information across multiple related time series. The empirical evaluation shows that the proposed methods have reasonable and consistent performance across all datasets, proving them to be strong baselines to be considered in one's forecasting toolbox.
TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications
Salinas, David, Erickson, Nick
We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1206 models evaluated on 200 classification and regression datasets. We illustrate the benefit of our dataset in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at no cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency. Machine learning on structured tabular data has a long history due to its wide range of practical applications. Significant progress has been achieved through improving supervised learning models, with key method landmarks including SVM (Hearst et al., 1998), Random Forest (Breiman, 2001) and Gradient Boosted Trees (Friedman, 2001). While the performance of base models is still being improved by a steady stream of research, their performance has saturated and state-of-the-art methods now leverage AutoML techniques (He et al., 2021) or new paradigms such as the pretraining of transformer models (Hollmann et al., 2022). AutoML solutions currently dominate tabular prediction benchmarks (Erickson et al., 2020; Gijsbers et al., 2022). Auto-Sklearn (Feurer et al., 2015a; 2020) was an early approach that proposed to select pipelines to ensemble from the Sklearn library and meta-learn the hyperparameter-optimization (HPO) with offline evaluations.
Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation
Hellan, Sigrid Passano, Shen, Huibin, Aubet, Franรงois-Xavier, Salinas, David, Klein, Aaron
We introduce ordered transfer hyperparameter optimisation (OTHPO), a version of transfer learning for hyperparameter optimisation (HPO) where the tasks follow a sequential order. Unlike for state-of-the-art transfer HPO, the assumption is that each task is most correlated to those immediately before it. This matches many deployed settings, where hyperparameters are retuned as more data is collected; for instance tuning a sequence of movie recommendation systems as more movies and ratings are added. We propose a formal definition, outline the differences to related problems and propose a basic OTHPO method that outperforms state-of-the-art transfer HPO. We empirically show the importance of taking order into account using ten benchmarks. The benchmarks are in the setting of gradually accumulating data, and span XGBoost, random forest, approximate k-nearest neighbor, elastic net, support vector machines and a separate real-world motivated optimisation problem. We open source the benchmarks to foster future research on ordered transfer HPO.
Optimizing Hyperparameters with Conformal Quantile Regression
Salinas, David, Golebiowski, Jacek, Klein, Aaron, Seeger, Matthias, Archambeau, Cedric
Many state-of-the-art hyperparameter optimization (HPO) algorithms rely on model-based optimizers that learn surrogate models of the target function to guide the search. Gaussian processes are the de facto surrogate model due to their ability to capture uncertainty but they make strong assumptions about the observation noise, which might not be warranted in practice. In this work, we propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise and, as a result, models the target function in a more realistic and robust fashion which translates to quicker HPO convergence on empirical benchmarks. To apply our method in a multi-fidelity setting, we propose a simple, yet effective, technique that aggregates observed results across different resource levels and outperforms conventional methods across many empirical tasks.