Goto

Collaborating Authors

 extrapolation


Learning to Extrapolate to New Tasks: A Relational Approach to Task Extrapolation

arXiv.org Machine Learning

Modern learning systems excel at interpolation but struggle to generalize to unseen tasks outside the training distribution's support. This failure occurs even in simple settings, such as handling task parameters beyond the training range, and persists despite advances in foundation models. To this end, we develop the Relational Task Extrapolator (RTE), an algorithm designed to enable systematic extrapolation to novel tasks. The key observation is that extrapolation is inherently relational: extrapolating to unseen tasks requires learning how tasks transform into one another. If a model learns the transformation between tasks A and B during training, it can apply that same transformation to relate known tasks to unseen ones at test time. RTE operationalizes this idea by decomposing each target task into a known anchor task and a transformation linking the anchor and target. It then learns a relational operator, mapping an anchor-transformation pair to predictions for the target task. We instantiate RTE across multiple task extrapolation regimes in function prediction, e.g. where target tasks use out-of-range parameters (parameter extrapolation), have greater compositional depth (length extrapolation), and/or recombine function primitives in unseen ways (compositional extrapolation). We further extend RTE to sequence prediction, integrating it into fine-tuning algorithms for foundation models. Across empirical studies, we find that RTE substantially outperforms existing approaches on extrapolation to novel, unseen tasks.


FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes

arXiv.org Machine Learning

We introduce FLUXtrapolation, a benchmark for extrapolating ecosystem fluxes under progressively harder distribution shifts. Ecosystem fluxes are central to understanding the carbon, water, and energy cycles, yet they can only be measured directly at sparsely located measurement towers. Producing global flux estimates therefore requires training models on observed sites using globally available covariates and predicting in unobserved regions, that is, upscaling. Flux upscaling is a challenging domain generalization problem that is affected by a shift in covariate distribution across climates, ecosystem types, and environmental conditions, as well as by conditional shift: important drivers remain unobserved at global scale. We provide a quantitative analysis of both these shifts in $P_X$ and $P_{Y\mid X}$. FLUXtrapolation is designed based on domain expertise on flux upscaling: it defines temporal, spatial, and temperature-based extrapolation scenarios and evaluates performance across held-out domains, temporal aggregations, and tail errors. In a pilot study, we find that baselines perform similarly under median hourly RMSE, but separate under the proposed tail-focused and multi-scale evaluation. FLUXtrapolation therefore poses a realistic and thus relevant challenge for machine learning methods under distribution shift; at the same time, progress on this benchmark would directly support the scientific goal of improving flux upscaling.


Extrapolation in Statistical Learning with Extreme Value Theory

arXiv.org Machine Learning

Extreme value theory provides rigorous theory and statistical tools for extrapolation in machine learning, particularly in settings where traditional methods struggle due to data scarcity in the tails. A broad range of tasks benefit from these advances, including regression and classification beyond the training data, extreme quantile regression, supervised and unsupervised dimension reduction, generative artificial intelligence and anomaly detection. This review synthesizes recent developments in these fields at the intersection of statistical learning and extreme value theory, with a focus on principled methods based on asymptotically motivated representations of the tail of univariate and multivariate distributions. We consider different theoretical frameworks for both asymptotically dependent and independent data and discuss how they translate into efficient statistical methods for extrapolation to extreme regions. By addressing both theoretical and practical aspects, we offer a comprehensive overview of the state-of-the-art in this quickly evolving field, and identify promising directions for future research.


Regularized Nonlinear Acceleration

Neural Information Processing Systems

We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.


Debiasing Conditional Stochastic Optimization

Neural Information Processing Systems

In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than the existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems.



Trustworthy Feature Importance Avoids Unrestricted Permutations

arXiv.org Machine Learning

Since their introduction by Breiman (2001), permutation-based feature importance measures have been widely adopted. However, randomly permuting the entries of a dataset may create new points far from the original data or even "impossible data." In a permuted dataset, we may find children who are retired or individuals who graduated from high school before they were born (Mase et al. 2022, p. 1). Forcing ML models to make predictions at these points causes them to extrapolate, making explanations unreliable (Hooker et al. 2021). Every non-trivial permutation-based variable importance measure, including SHAP (Lundberg and Lee 2017), Knockoffs (Barber and Candés 2015), conditional model reliance (Fisher et al. 2019), and accumulated local effect (ALE) plots (Apley and Zhu 2020) suffer from this. We propose and compare three new strategies to address extrapolation issues. The first combines conditional model reliance from Fisher et al. (2019) with a Gaussian transformation. By mapping data quantiles to a Gaussian distribution and back, we adjust only the quantiles of point values, significantly reducing extrapolation. Under a Gaussian copula assumption for the feature distribution, we prove that the new data points follow the same probability distribution as the original data.


Fréchet Regression on the Bures-Wasserstein Manifold

arXiv.org Machine Learning

Fréchet regression, or conditional Barycenters, is a flexible framework for modeling relationships between covariates (usually Euclidean) and response variables on general metric spaces, e.g., probability distributions or positive definite matrices. However, in contrast to classical barycenter problems, computing conditional counterparts in many non-Euclidean spaces remains an open challenge, as they yield non-convex optimization problems with an affine structure. In this work, we study the existence and computation of conditional barycenters, specifically in the space of positive-definite matrices with the Bures-Wasserstein metric. We provide a sufficient condition for the existence of a minimizer of the conditional barycenter problem that characterizes the regression range of extrapolation. Moreover, we further characterize the optimization landscape, proving that under this condition, the objective is free of local maxima. Additionally, we develop a projection-free and provably correct algorithm for the approximate computation of first-order stationary points. Finally, we provide a stochastic reformulation that enables the use of off-the-shelf stochastic Riemannian optimization methods for large-scale setups. Numerical experiments validate the performance of the proposed methods on regression problems of real-world biological networks and on large-scale synthetic Diffusion Tensor Imaging problems.



Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation

Neural Information Processing Systems

Out-of-distribution (OOD) detection is important for deploying reliable machine learning models on real-world applications. Recent advances in outlier exposure have shown promising results on OOD detection via fine-tuning model with informatively sampled auxiliary outliers. However, previous methods assume that the collected outliers can be sufficiently large and representative to cover the boundary between ID and OOD data, which might be impractical and challenging. In this work, we propose a novel framework, namely, Diversified Outlier Exposure (DivOE), for effective OOD detection via informative extrapolation based on the given auxiliary outliers. Specifically, DivOE introduces a new learning objective, which diversifies the auxiliary distribution by explicitly synthesizing more informative outliers for extrapolation during training. It leverages a multi-step optimization method to generate novel outliers beyond the original ones, which is compatible with many variants of outlier exposure. Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed DivOE.