kernel mean
MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time Adaptation
Reliable deployment of machine learning models requires reasoning under epistemic uncertainty--the ability to recognize when the operating distribution has shifted beyond the scope of what was encountered during training. This challenge is central to test-time adaptation (TTA), a paradigm in which a model pretrained on source distribution Ps receives unlabeled data from a target distribution Pt = Ps at deployment time. Existing TTA methods (Wang et al., 2021; Niu et al., 2023; Zhang et al., 2022a; Yuan et al., 2023; Su et al., 2022) improve accuracy under distribution shift by adapting model parameters using statistics computed from test batches, but they provide no formal guarantees about when predictions should be trusted or how much risk degrades as a function of shift magnitude. This gap is particularly concerning in safety-critical applications such as autonomous driving, medical imaging, and financial risk assessment, where a model that silently degrades under distribution shift can cause significant harm. The inability to quantify how wrong a model's predictions might be in an unseen environment fundamentally limits its trustworthy deployment.
Convex-Geometric Error Bounds for Positive-Weight Kernel Quadrature
Kernel quadrature (KQ) is a kernel-based approach to numerical integration, closely related to Bayesian quadrature (BQ) and probabilistic integration [38, 39, 10]. For sufficiently regular integrands, KQ can exploit spectral structure in a reproducing kernel Hilbert space (RKHS) that is invisible to plain Monte Carlo and thereby converge faster than the usual O(N 1/2) rate in the number of points [3, 28]. Unconstrained kernel-based rules, however, may produce numerically unstable weights, motivating longstanding interest in positively weighted rules [13, 21, 29, 46]. In this paper, positive weights mean nonnegative weights that sum to one, i.e., simplex or convex-combination weights. Whether positive-weight KQ can systematically improve over Monte Carlo is a subtle question. Kernel herding and related constructions suggested fast rates under favorable assumptions [13], but the conditional-gradient viewpoint of Bach et al. [4] clarified that the strongest such assumptions are not generally available in infinite-dimensional RKHSs. Subsequent herding-type analyses in broad RKHS settings have therefore mostly remained at the Monte-Carlo scale, except under additional structure or modified algorithms such as sparse herding variants [31, 44, 43]. Beyond herding, subsampling-based positive KQ methods such as thinning [16, 15] and recombination [21, 24] have obtained rates beyond Monte Carlo, but a general mechanism for such improvement in the simple i.i.d.
Nonparametric Distribution Regression Re-calibration
Jung, Ádám, Kelen, Domokos M., Benczúr, András A.
A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow intervals. Realizing the problem, several recent works have focused on post-hoc corrections; however, existing methods either rely on weak notions of calibration (such as PIT uniformity) or impose restrictive parametric assumptions on the nature of the error. To address these limitations, we propose a novel nonparametric re-calibration algorithm based on conditional kernel mean embeddings, capable of correcting calibration error without restrictive modeling assumptions. For efficient inference with real-valued targets, we introduce a novel characteristic kernel over distributions that can be evaluated in $\mathcal{O}(n \log n)$ time for empirical distributions of size $n$. We demonstrate that our method consistently outperforms prior re-calibration approaches across a diverse set of regression benchmarks and model classes.
Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes
Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captures additional information related to the filtration. We derive empirical estimators for the associated higher order maximum mean discrepancies (MMDs) and prove consistency. We then construct a filtration-sensitive kernel two-sample test able to pick up information that gets missed by the standard MMD test. In addition, leveraging our higher order MMDs we construct a family of universal kernels on stochastic processes that allows to solve real-world calibration and optimal stopping problems in quantitative finance (such as the pricing of American options) via classical kernel-based regression methods. Finally, adapting existing tests for conditional independence to the case of stochastic processes, we design a causaldiscovery algorithm to recover the causal graph of structural dependencies among interacting bodies solely from observations of their multidimensional trajectories.
Statistical Optimal Transport posed as Learning Kernel Embedding
The objective in statistical Optimal Transport (OT) is to consistently estimate the optimal transport plan/map solely using samples from the given source and target marginal distributions. This work takes the novel approach of posing statistical OT as that of learning the transport plan's kernel mean embedding from sample based estimates of marginal embeddings. The proposed estimator controls overfitting by employing maximum mean discrepancy based regularization, which is complementary to $\phi$-divergence (entropy) based regularization popularly employed in existing estimators. A key result is that, under very mild conditions, $\epsilon$-optimal recovery of the transport plan as well as the Barycentric-projection based transport map is possible with a sample complexity that is completely dimension-free. Moreover, the implicit smoothing in the kernel mean embeddings enables out-of-sample estimation. An appropriate representer theorem is proved leading to a kernelized convex formulation for the estimator, which can then be potentially used to perform OT even in non-standard domains. Empirical results illustrate the efficacy of the proposed approach.
BayesSum: Bayesian Quadrature in Discrete Spaces
Kang, Sophia Seulkee, Briol, François-Xavier, Karvonen, Toni, Chen, Zonghao
This paper addresses the challenging computational problem of estimating intractable expectations over discrete domains. Existing approaches, including Monte Carlo and Russian Roulette estimators, are consistent but often require a large number of samples to achieve accurate results. We propose a novel estimator, \emph{BayesSum}, which is an extension of Bayesian quadrature to discrete domains. It is more sample efficient than alternatives due to its ability to make use of prior information about the integrand through a Gaussian process. We show this through theory, deriving a convergence rate significantly faster than Monte Carlo in a broad range of settings. We also demonstrate empirically that our proposed method does indeed require fewer samples on several synthetic settings as well as for parameter estimation for Conway-Maxwell-Poisson and Potts models.