Genre
Identifiability and Estimation for Unlabeled Finite Mixtures under Marginal Independence
Kanamori, Takafumi, Hirose, Yushi, Yamamoto, Shohei
We study component recovery and mixing-matrix estimation from unlabeled finite mixtures whose observable distributions share the same latent components but have unknown mixing weights. The main identifying signal is marginal independence: each component is assumed to be independent on at least one coordinate pair, but no labels, clean component samples, or mixing weights are observed. We first prove a structural result for product components: under linear independence of the univariate marginals, any independent affine combination of the components must coincide with a single component. We then extend this principle to observable mixtures and show that, under full-rank and no-cancellation conditions, marginally independent affine combinations recover the corresponding latent components. When every component is independent on some coordinate pair, all components are identifiable, and the mixing matrix is recoverable under the stated completion conditions. Finally, we propose a Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimator over affine combinations of the observable mixtures and prove uniform convergence and stability under approximate marginal independence. This framework also separates the empirical roles of the assumptions: irreducibility is, in general, not directly testable from the unlabeled mixtures alone, whereas marginal independence yields a candidate-level diagnostic through held-out PM-MMD. Controlled and flow-cytometry experiments show when marginal independence provides a useful recovery signal. In the reported multi-component comparisons, condition-aware representative selection stabilizes PM-MMD and improves recovery relative to clustering, factorization, and pairwise mixture-proportion baselines using the same unlabeled mixtures.
Stable and Scalable Probabilistic Numerical Solvers for Stiff and High-Dimensional ODEs
Filtering-based probabilistic numerical solvers for ordinary differential equations (ODEs) have been established as a flexible and efficient simulation framework with built-in numerical uncertainty quantification. However, problems that are both stiff and high-dimensional remain a challenge, as current methods are either stable and have cubic cost in the ODE dimension, or scale linearly at the expense of stability. In this paper, we close this gap and develop probabilistic ODE solvers that are both stable and scalable. We propose two complementary strategies. First, we develop a matrix-free update step that uses Jacobian-vector products, iterative linear solvers, and stochastic covariance estimation to enable linear scaling, all while retaining stability. Second, we propose iterative re-linearization to further improve stability without sacrificing scalability, turning probabilistic ODE solvers into fully implicit methods. We evaluate the proposed approaches on a range of stiff and high-dimensional problems and demonstrate improved stability and scalability over established probabilistic solvers.
Instrumented data for causal scientific machine learning
Scientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&V) instrumented image-to-simulation pipelines are one realisation: a sensor observation becomes a fully specified, solver-backed simulation with explicit, editable parameters and a propagated aleatoric/epistemic uncertainty. The substrate is case-specific, mechanistically supervised, and supports causal interventions through Pearl's do-operator.
Assessing model calibration with boosting trees
In regression modelling, the primary objective is to approximate the true conditional mean of a response given a set of features. To this end, various statistical models are used to fit a regression function that provides a mean estimate for each single set of features. This function is said to be calibrated if the resulting mean estimates match the true conditional means for almost all features. Aiming for calibration seems not achievable in practice as models are fitted on finite samples of noisy observations. A weaker notion of calibration is auto-calibration (sometimes also called mean-calibration or well-calibration); see, for example, Kr uger-Ziegel [22] and Denuit et al. [7]. This notion goes back to earlier works on the reliability of probabilistic forecasts in meteorology; we refer to Bross [2], Sanders [26] and Murphy-Winkler [23]. It means that when responses are grouped according to their mean estimates, the average of the responses within each group matches this estimate. This property is important in various applications where sums of mean estimates have to match sums of responses at a global and local level. This is, for example, the case in insurance pricing as an auto-calibrated pricing system avoids systematic cross-subsidy between different price cohorts; we refer the reader to Pohle [24], Denuit et al. [6], Fissler et al. [9] and W uthrich-Merz [30].
Rank Intervals for Leaderboards: A Hierarchical Framework for Model Evaluation
Neuhof, Bitya, Benjamini, Yuval
Pretrained models are often evaluated on multi-task leaderboards to measure their applicability in diverse contexts. However, current methods for aggregating performance across tasks into leaderboard-level rankings do not address the uncertainty and variability at the task level. While recent works have proposed interval-based model rankings, the principled aggregation of uncertainty from individual tasks to leaderboard-level rankings remains unaddressed, and variation in models' performance across tasks is frequently obscured. In this work, we introduce a hierarchical framework that constructs model rank intervals with statistical guarantees at both levels: task-level rank confidence intervals from pairwise comparisons, and leaderboard-level rank prediction intervals using a conformal approach. This enables reliable quantification of model rank for each observed task and for new potential tasks. Experiments on simulated data and the TabArena and PromptEval (MMLU) benchmarks show that our method yields statistically valid and informative intervals, enabling reliable, uncertainty-aware model ranking on leaderboards.
MEC-Cox: Machine-Learning-Assisted Generalized Entropy Calibration for ATT Marginal Hazard-Ratio Estimation
Lee, Se Yoon, Kwon, Yonghyun, Kim, Jae Kwang
Externally controlled survival trials are increasingly used when concurrent randomized controls are infeasible, particularly in oncology and rare-disease settings with time-to-event endpoints. We target an average-treatment-effect-on-the-treated (ATT)-type marginal hazard-ratio estimand, comparing treatment with counterfactual control in the treated trial population, and estimate it using inverse-probability-weighted (IPW) Cox regression. Valid inference is challenging because IPW Cox regression depends on the weights through both event contributions and risk-set averages, making flexible machine-learning nuisance estimation difficult to incorporate directly. Building on machine-learning-assisted generalized entropy calibration (MEC) by Lee and Kim (2026), we propose MEC-Cox for ATT-weighted IPW Cox regression. The method begins with normalized source-propensity-score odds weights for external controls and then applies Bregman calibration to balance cross-fitted prognostic summaries between external controls and treated trial patients. The calibration basis may include control-survival predictions, Cox linear predictors, penalized-survival-model predictions, or other prognostic-score summaries. MEC-updated weights therefore play a dual role as source-transport and prognostic-score balancing weights. We establish consistency, characterize a calibration-induced efficiency gain, and develop a stacked sandwich variance estimator. Simulations show that MEC-Cox can reduce bias, increase efficiency, and improve coverage through flexible machine-learning-assisted adjustment.
Transfer learning for causal forest
Jocteur, Bérénice-Alexia, Maume-Deschamps, Véronique, Ribereau, Pierre
Transfer learning addresses the challenge of transfering knowledge from one domain to another. Traditional transfer learning focuses on adapting models trained on a source domain (with a lot of observations) to improve performance on a target domain (with few observations). In this work we consider the case of a model shift and we focus on the transfer learning applied to a causal forest namely HTERF. This causal forest aims to estimate the Conditional Average Treatment Effect (CATE). The approach considered is the offset method presented by Wang (2016) adapted to a causal context. This method relies on the use of intermediate models in order to estimate the offset between source and target distributions. Our main result is a bound on the CATE error of HTERF on target depending on the error of the intermediate models. Simulation studies show the good performances of this approach in different settings on simulations and on a real-world dataset.
Large-scale empirical tuning and comparison of default optimizers for variational inference
Campbell, Trevor, Huggins, Jonathan H., Kim, Kyurae, Margossian, Charles C.
Black-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuning, which undermines its promise as a truly "black box" inference algorithm. However, over the past decade, many new adaptive stochastic optimization algorithms have been developed that reduce or remove entirely the need for tuning. In this work, we investigate this new collection of adaptive methods in the context of BBVI, with the goal of establishing the current state of the art in tuning-free optimization-based inference. In particular, we present a large-scale empirical evaluation of 56 stochastic gradient-based optimization algorithms applied to 1092 Bayesian inference optimization problems, involving over 550,000 individual optimization runs and 15 core-years of compute. The optimization algorithms we evaluate are chosen to represent a wide spectrum of recent approaches and the benchmark problems are chosen to span a range of difficulty, with posterior target dimension 1-10^4, condition number 1-10^8, and a range of variational families. Our results show that no single method dominates, but running a selection of 5 algorithms suffices to reliably get close to the best-possible observed performance. We thus provide a strong baseline for applications where expert tuning is not possible and for comparison when developing new stochastic optimization algorithms.
Boundary Variance Inflation Causes Acquisition Bias in Gaussian Processes
Bånkestad, Maria, Jarl, Sanna, Sjölund, Jens
Gaussian processes with stationary kernels on bounded domains exhibit inflated posterior variance near the boundary. Despite being a long-recognized artifact in geostatistics and a source of over-exploration in Bayesian optimization, the causes and effects of boundary-induced acquisition bias are underexplored. We trace the root cause to a simple geometric mechanism: the truncation of the kernel correlation neighborhood at the domain boundary creates an observation-independent distortion that worsens with dimensionality. We show how this distortion manifests across three acquisition classes: variance maximization concentrates selections at the corners, whereas negative integrated posterior variance and expected predictive information gain move selections inward to axis-aligned interior shells. These patterns arise without reference to any objective function, meaning that acquisition behavior can be dominated by kernel geometry rather than the desired task-specific uncertainty. To quantify this, we introduce a function-free selection-profile diagnostic for arbitrary acquisitions, kernels, and bounded-domain geometries.
Active Learning with Foundation Model Priors: Efficient Learning under Class Imbalance
Zhang, Jiancheng, Li, Meiqing, Zhang, Qi, Zhu, Yinglun
Real-world datasets across image and text domains are often characterized by skewed class distributions and noisy annotations, which jointly degrade model performance, particularly on minority classes. Among existing solutions, active learning offers an effective and efficient paradigm by selectively querying the most informative and balanced samples for annotation. We propose an innovative active learning framework that mitigates class imbalance and selects the most informative samples to annotate. Leveraging foundation model priors, our algorithm enables imbalance-aware co-decisions between foundation model and small model to tackle noisy and imbalanced labels across various domains. We introduce the first study to systematically explore active learning under the dual challenges of label noise and class imbalance across image and text domains. Extensive experiments on imbalanced datasets demonstrate that our method achieves substantial annotation savings-over 50% compared to the best active learning baseline-while preserving performance and robustness to label noise.