Country
Active Multiple-Prediction-Powered Inference
Brawand, Nicholas, Leclerc, Nima, Ngo, Anhthy, Peterson, Matthew, Vishwanath, Sriram, Alhussein, Laith, Wellner, Ben
Post-deployment monitoring of healthcare AI requires statistically valid, label-efficient methods, but gold-standard labels from clinician chart review are expensive. Prediction-powered inference (PPI) and active statistical inference (ASI) reduce label cost by combining a small labeled sample with abundant model predictions, but both are restricted to a single predictor, a poor fit for modern clinical pipelines that have multiple predictors of differing cost and accuracy available at inference time. We propose Active Multiple-Prediction-Powered Inference (AM-PPI), which routes each instance to a cost-appropriate predictor subset, samples gold-standard labels in proportion to the chosen subset's residual uncertainty, and reweights predictions to minimize estimator variance, all under a single deployment-time budget. AM-PPI generalizes ASI to leverage multiple predictors and extends Multiple-PPI from global per-predictor allocation to per-instance adaptive routing. We derive closed-form Karush-Kuhn-Tucker (KKT) conditions for all three decisions and prove, via biconvexity and strong duality, that the resulting fixed point is a global optimum despite the joint problem being non-jointly-convex. We establish asymptotic normality with valid coverage, minimum-variance unbiasedness within the linear-prediction augmented inverse propensity weighted (AIPW) class, and a closed-form criterion identifying when multiple predictors help. On synthetic data and three healthcare monitoring tasks, AM-PPI produces 10 to 40 percent narrower confidence intervals (CIs) than single-predictor ASI in the budget regime where routing matters, and matches the better baseline elsewhere.
A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering
Wang, Zhanliang, Xiao, Jiancong, Jin, Ruochen, Yang, Shu, Hou, Bojian, Shen, Li
Calibration measures whether a model's predicted confidence aligns with its empirical accuracy, and is central to the reliable deployment of large language models (LLMs) in high-stakes domains such as medicine and law. While much recent work focuses on improving LLM calibration, the equally important question of how to evaluate it in realistic settings remains underdeveloped. Open-ended question answering (QA), the most common deployment setting for modern LLMs, is where existing evaluation methods fall short: logit-based metrics need restricted output formats and internal probabilities; verbalized confidence is self-reported and often overconfident; and sampling-based methods rely on task-specific extraction rules without a clear finite-sample target. We introduce Sem-ECE (Semantic-Sampling Expected Calibration Error), a calibration evaluation framework for open-ended QA that samples answers from the model, groups them into semantic classes, and uses the resulting frequencies as confidence. We study two estimators within this framework: Sem$_1$-ECE, the same-sample self-consistency score, and Sem$_2$-ECE, a held-out variant that separates answer selection from confidence evaluation. We prove both are asymptotically unbiased, and further show that they agree on easy questions but diverge on hard ones with Sem$_2$ achieving strictly smaller calibration error, so their gap also serves as a diagnostic for question difficulty. Experiments on three open-ended QA benchmarks across five leading commercial LLMs match our theoretical predictions and show that Sem-ECE outperforms verbalized confidence and existing sampling-based methods, while complementing logit-based evaluation when internal probabilities are unavailable.
Sinkhorn Treatment Effects: A Causal Optimal Transport Measure
We introduce the Sinkhorn treatment effect, an entropic optimal transport measure of divergence between counterfactual distributions. Unlike classical quantities such as the average treatment effect, this measure captures differences across entire distributions. We analyze this divergence as a statistical functional and show it can be written as a smooth transformation of counterfactual mean embeddings with an appropriate kernel. This characterization allows us to establish first-order pathwise differentiability in general, and second-order pathwise differentiability under the null hypothesis of equal counterfactual distributions. Leveraging this smoothness, we construct debiased estimators and use them to obtain asymptotically valid tests for distributional treatment effects with a fixed entropic regularization parameter. Because the power of the test depends on this unknown parameter, we further propose an aggregated test that combines evidence across a grid of regularization choices. Experiments on simulated and image data demonstrate the practical advantages of our estimator and testing procedure.
Sliced Inner Product Gromov-Wasserstein Distances
Gong, Xiaoyun, Rioux, Gabriel, Goldfeld, Ziv
The Gromov-Wasserstein (GW) problem provides a framework for aligning heterogeneous datasets by matching their intrinsic geometry, but its statistical and computational scaling remains an issue for high-dimensional problems. Slicing techniques offer an appealing route to scalability, but, unlike Wasserstein distances, GW problems do not generally admit closed-form solutions in one-dimension. We resolve this problem for the GW problem with inner product cost (IGW), propose a sliced IGW distance that enjoys a natural rotational invariance property, and comprehensively study its structural and computational properties. Numerical experiments validating our theory are presented, followed by applications to heterogeneous clustering of text data and language model representation comparison.
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
Fang, Zhenhan, Tan, Aixin, Huang, Jian
Density estimation and reliable prediction regions for outputs are crucial in supervised and unsupervised learning. While conformal prediction effectively generates coverage-guaranteed regions, it struggles with multi-dimensional outputs due to reliance on one-dimensional nonconformity scores. To address this, we introduce CONTRA: CONformal prediction region via normalizing flow TRAnsformation. CONTRA utilizes the latent spaces of normalizing flows to define nonconformity scores based on distances from the center. This allows for the mapping of high-density regions in latent space to sharp prediction regions in the output space, surpassing traditional hyperrectangular or elliptical conformal regions. Further, for scenarios where other predictive models are favored over flow-based models, we extend CONTRA to enhance any such model with a reliable prediction region by training a simple normalizing flow on the residuals. We demonstrate that both CONTRA and its extension maintain guaranteed coverage probability and outperform existing methods in generating accurate prediction regions across various datasets. We conclude that CONTRA is an effective tool for (conditional) density estimation, addressing the under-explored challenge of delivering multi-dimensional prediction regions.
Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs
Unlike a standard PINN--which produces an approximate Deep neural networks (DNNs) or multi-layer perceptronssolution by minimizing a PDE-residual loss and thus yields (MLPs) offer various inherent advantages over traditionalonly a point estimate, failing to quantify uncertainty inapproaches of scientific computing and data analysis, suchduced by noisy or limited data, a Bayesian PINN returns a as finite element methods, wavelets and kernel methods, full posterior distribution over solutions by combining the which are often hampered by the irregular and nonlinearuncertain information from the likelihood (data) and the data structures and the high input dimensions. In contrast, DNNs are capable of approximating a rich class of functions prior. Bayesian neural networks, originating in the seminal works of MacKay (MacKay, 1995) and Neal (Neal, 1995), with aforementioned complexities and can also easily en-have been extensively studied over the past three decades codes additional complex physical structures, such as sym- (Lampinen & Vehtari, 2001; Titterington, 2004; Graves, metry and other invariant structures.
Core-Halo Decomposition: Decentralizing Large-Scale Fixed-Point Problems
Haixiang, null, Xu, Yang, Zhang, Jiefu, Wu, Xudong, Zhou, Zihan, He, Jun, Chen, Jiayu
We study solving large-scale fixed-point equation x = F(x) with decomposition. Standard strict decomposition assigns each agent a disjoint block and evaluates updates using only owned coordinates. For most operators, however, a block update may depend on variables outside the block. Truncating these dependencies by strict decomposition changes the mean operator and creates structural bias that cannot be removed by more samples, smaller stepsizes, or additional consensus. We therefore propose Core-Halo decomposition, which separates write ownership from read-only evaluation context: each agent updates its own core and reads from an overlapping halo. By aligning the Core-Halo decomposition with the blockdependence structure of F, the original fixed-point problem can be implemented faithfully in a decentralized multi-agent system. We further characterize the fundamental obstruction faced by strict decomposition through a Bellman closure condition and a blockwise bias lower bound, showing that local-only updates can alter the original fixed-point operator. Finally, we conduct extensive experiments across a range of application settings, and demonstrate that Core-Halo achieves near-centralized performance while retaining the parallelism benefits of decentralization.
Minimax Optimal Estimation of Transport-Growth Pairs in Unbalanced Optimal Transport
Ponnoprat, Donlapark, Isobe, Noboru, Imaizumi, Masaaki
Unbalanced optimal transport (UOT) extends classical optimal transport to measures with different total masses, but statistical guarantees for Monge-type estimation remain limited. We study unbalanced transport with quadratic cost and Kullback-Leibler marginal penalties and argue that the natural population target is not a map alone, but a transport-growth pair. Consequently, we develop two estimators for the transport-growth pairs under several setups: an optimal transport plan-based estimator for a general case, and a kernel-based estimator for a case with smooth densities. We also show that an error of the estimator achieves the minimax optimal rate by deriving a matching lower bound of the minimax risk. Our main technical contribution is a value-based stability reduction that converts perturbations of the UOT objective into transport and growth risks through a UOT gap condition. These results provide a statistical foundation for Monge-type estimation in unbalanced optimal transport.
Simultaneous Monitoring of Shape and Surface Color via 4D Point Clouds: A Registration-free Approach
Patalano, Mariafrancesca, Capizzi, Giovanna, Paynabar, Kamran
Advanced manufacturing technologies allow for the production of intricate parts featuring high shape complexity and spatially-varying material composition. Data fusion of point clouds with chromatic attributes provides 4D point clouds, a compact and informative representation that encodes both shape and material information. In this paper, we present a registration-free framework for Simultaneous Monitoring of shApe and Color (SMAC) via 4D point clouds. The proposed framework leverages Laplace-Beltrami operator spectral properties to capture and monitor geometric features and the relationship between shape and surface color. A combined monitoring scheme is proposed to effectively detect shape deformations and color anomalies, along with a spatially-aware post-signal diagnostic procedure to determine the source of change and localize color anomalies. Importantly, neither component relies on registration or mesh reconstruction, eliminating error-prone and computationally expensive preprocessing steps. A Monte Carlo simulation study and a case study on functionally graded materials demonstrate that SMAC achieves effective detection performance, particularly for subtle defects, while providing diagnostic capabilities to identify the source and location of anomalies.
Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle
Richtárik, Peter, Gruntkowska, Kaja, Li, Hanmin
We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.