Technology
Hybrid Uncertainty Sensitivity Analysis Based on the HSIC for High-Dimensional Responses with Aleatory--Epistemic Separation
Zhong, Shijie, Fu, Jiangfeng, Wei, Pengfei
Quantifying the influence of hybrid aleatory and epistemic uncertainties on high-dimensional system responses remains a major challenge in global sensitivity analysis (GSA). Existing Hilbert--Schmidt Independence Criterion (HSIC)-based approaches are primarily restricted to single-output settings and lack a rigorous decomposition of heterogeneous uncertainty sources and their interactions. To address this limitation, a novel double-space tensor-product RKHS framework is proposed for sensitivity analysis under hybrid uncertainty. By constructing factorized kernels over both the latent input space and the multidimensional output space, a concurrent double Mรถbius inversion is derived to orthogonally decompose the global dependence measure into pure aleatory effects, pure epistemic effects, and their interaction contributions. The resulting dimension-wise sensitivity indices preserve the uncertainty attribution structure across all output dimensions. To satisfy the independence assumptions required by the decomposition, an auxiliary-variable representation based on the inverse probability integral transform is introduced, enabling the treatment of hierarchical uncertainties and Copula-induced correlations within a unified latent space. A fully vectorized single-loop implementation is further developed to avoid the computational burden of nested Monte Carlo simulation. Statistical significance and estimation uncertainty are quantified through permutation testing and Bootstrap confidence intervals. Numerical studies on a modified multi-output Ishigami function and an aerodynamic pressure-field problem demonstrate the accuracy, scalability, and practical applicability of the proposed framework.
Conformal calibration and look-elsewhere effect in anomaly detection for new-physics searches
Araz, Jack Y., Spannowsky, Michael
Machine-learned anomaly detection is reshaping searches for new physics, but it has outrun the statistics used to interpret it. A raw anomaly score has no calibrated meaning, a model that scans many regions inflates the look-elsewhere effect, and the asymptotic significances the field relies on are blind to the background mismodelling that anomaly detectors are especially prone to. We propose a calibration layer, built on conformal prediction, that turns any anomaly score into a defensible significance with distribution-free, finite-sample guarantees. Conformal prediction converts scores into valid local p-values, weighted and Mondrian variants repair the sideband-to-signal-region exchangeability failures that resonant searches suffer, and a Gross-Vitells step carries the result through to a look-elsewhere-aware global significance. The layer does two things at once. It exposes miscalibration that the standard pipeline cannot see, and it corrects it without retraining the detector. On public LHC Olympics data, a classifier develops a substructure-mass correlation that makes sideband-calibrated background p-values anti-conservative. Taken at face value, this manufactures a $\sim 46ฯ$ excess from background sculpting alone, which the label-free weighted correction removes, restoring an honest null. When run as a blind wide-mass bump hunt, the standard asymptotic and unweighted procedures fabricate $\gtrsim10ฯ$ excesses and $\approx5ฯ$ excesses even in signal-free windows, while the conformal layer raises no false alarms and its global false-positive rate is verified on background-only pseudoexperiments. The result is an auditable, detector-agnostic path from an uncalibrated score to a trials-factor-aware significance, ready to be folded into experimental anomaly searches.
Geometric Domain Adaptation via Optimal Transport for Linear Regression in R^2
Britos, Brian, Bourel, Mathias
Optimal Transport has become recently a powerful method for domain adaptation by aligning source and target distributions. We study a supervised domain adaptation problem where source and target domains are related by a rotation or a translation or a homothety in $\mathbb{R}^2$. We prove that the optimal transport map recovers the underlying map when using a $p-$norm cost with $p \geq 2$. Based on this insight, we develop a method combining $K-$means and optimal transport to estimate the underlying map, enabling adaptation of linear regression models when target data is scarce. Simulations demonstrate improved performance over baseline methods. Rather than relying on highly expressive deep learning architectures, we focus on classical machine learning models to emphasize interpretability and theoretical insight. This perspective allows us to explicitly characterize the role of optimal transport in recovering geometric transformations such as rotations, translations, and homotheties. Our contributions include a theoretical result linking optimal transport and rotations, translations and homothecies in $\mathbb{R}^2$, and a practical method for adaptation in linear regression offering both conceptual clarity and applied value in domain adaptation tasks in this space.
Beyond the Training Distribution: Evaluating Predictions Under Distribution Shift and Selection Bias
Ulichney, Annie, Coston, Amanda
Understanding how a prediction model will perform in a new environment before deployment is essential to preventing harm when algorithms inform decision-making. Two common sources of model performance degradation are (i) covariate shift, where the target covariate distribution differs from the source, and (ii) selective labels, where the observability of outcomes depends on historical decisions. We study pre-deployment model evaluation under the joint presence of covariate shift and labeling of outcomes selectively based on observed features. In particular, we present a double machine learning procedure for estimating the target risk of an arbitrary black-box prediction model under a general loss function. We show identification of this estimand under standard assumptions and derive a bias-corrected estimator based on the influence function of the target risk. Finally, we evaluate our estimator through experiments using the eICU electronic health records database, showing that it tracks the true target risk more accurately than methods that address either selective labels or covariate shift alone, as well as baselines that combine standard plug-in approaches.
LoMC: Localized Multidirectional Correction for Refusal Suppression in Routed Foundation Models
Hong, Yan, Xiu, Kedong, Li, Wei, Lan, Jun, Zhu, Huijia, Zhou, Shuheng, Lyu, Zhongcai, Wang, Weiqiang, Zhang, Jianfu
We study controlled post-training refusal suppression in routed MoE and hybrid-MoE foundation models, aiming to increase non-refusal target-response behavior while preserving general capability under a compact intervention footprint. Existing broad direction-based edits can perturb general-purpose computation, whereas support-only expert edits often lack sufficient capacity to correct heterogeneous refusal representations. To address this limitation, we introduce Localized Multidirectional Correction (LoMC), a support-gated intervention framework that follows a support-then-correction execution order: it first identifies a compact edit support, then aggregates prototype correction directions into layer-wise correction directions, and finally applies rank-one layer-wise correction only within the selected support. By using the edit support as a structural gating constraint, LoMC increases correction capacity without expanding the intervention scope. Experiments on text-only and multimodal safety benchmarks across four routed backbones show that LoMC substantially improves non-refusal target-response behavior while maintaining general capability under a compact intervention footprint.
Temperature transferable Machine Learned Coarse Grained model for proteins
Venturin, Jacopo, Clementi, Cecilia
Coarse-grained (CG) molecular simulations offer an efficient alternative to atomistic molecular dynamics to study large and complex biological systems. The accuracy of CG simulations has been increased dramatically by the introduction of machine-learned coarse-grained (MLCG) models. However, these models are typically designed to be used at a single thermodynamic point, lack temperature transferability, and can not be used to predict temperature dependent quantities like the heat capacity. Here we introduce a thermodynamically informed, temperature-transferable MLCG framework for proteins that explicitly decomposes the CG potential of mean force (PMF) into its energetic and entropic components. The model architecture enforces an exact thermodynamic relation between the energetic and entropic components of the PMF and guarantees physically consistent extrapolation and interpolation across temperature regimes. We validate this framework on an extensive dataset spanning a total of 250 $ฮผ$s of molecular dynamics simulations across five temperatures between 300 K and 400 K for the Chignolin protein, and demonstrate that it reproduces the temperature dependency of the reference atomistic free energy surfaces, correcting the temperature-unaware baselines. Furthermore, we show that it is possible to apply an inexpensive, post-hoc temperature-dependent correction that does not require retraining the MLCG potential, accurately recovering the atomistic heat capacity at different temperatures. Overall, this work provides a physically grounded pathway toward thermodynamically transferable MLCG simulations of complex biomolecular systems.
Federated Learning for Feature Generalization with Convex Constraints
Kim, Dongwon, Kim, Donghee, Shyn, Sung Kuk, Kim, Kwangsu
Federated learning (FL) often struggles with generalization due to heterogeneous client data. Local models are prone to overfitting their local data distributions, and even transferable features can be distorted during aggregation. To address these challenges, we propose FedCONST, an approach that adaptively modulates update magnitudes based on the global model's parameter strength. This prevents over-emphasizing welllearned parameters while reinforcing underdeveloped ones. Specifically, FedCONST employs linear convex constraints to ensure training stabil-Figure 1. Illustration of the parameter space in FL. (1) Vanilla ity and preserve locally learned generalization FL drives the optimization process away from the generalization capabilities during aggregation.
A Stationarity-and-Coupling Criterion for Training-Free Time-Lagged Spectral Embeddings of Multivariate Time Series
Pal, Siddharth, Rojkova, Viktoria
We study training-free fixed-length descriptors for multivariate time series and ask not merely whether such a descriptor performs well, but when it can be expected to work at all. Our object of study is $D(ฯ)$, built from a time-lagged correlation matrix truncated at the Marchenko-Pastur edge so that only signal-bearing eigenvalues survive and classified by cosine similarity to class centroids with zero learned parameters. The central contribution is not the descriptor but a falsifiable applicability criterion for it. Working from a stationary Gaussian VAR(1) model, we argue that $D(ฯ)$ separates two classes when the signals are approximately stationary and the class information lives in their cross-channel temporal coupling rather than in marginal per-channel power. We derive, semi-formally, three consequences: a distinguishability condition, why the static ($ฯ=0$) covariance collapses to chance, and why a stationary but power-discriminated paradigm defeats the descriptor. The criterion is operational: a two-part pre-flight test -- an augmented Dickey-Fuller stationarity check and a power-baseline saturation check -- predicts applicability before any training. We validate both halves on a mixed assortment. On four paradigms that satisfy the criterion (Sleep-EDF, BCI-IV-2a, MIT-BIH, ESC-50) the descriptor is competitive with strong baselines at a fraction of their cost, reaching $88.5\pm4.5\%$ under 20-subject leave-one-subject-out on Sleep-EDF on a single CPU thread. On three that violate it -- non-stationary ERPs, and financial-volatility and wearable-stress regimes that are power-discriminated -- it fails exactly as the pre-flight predicts, and these negatives are the more informative half. We are explicit that $D(ฯ)$ is not the most accurate representation; its value is a compact, training-free embedding whose domain of validity is known in advance.
GPAS: Accelerating Convergence of LLMPretraining via Gradient-Preserving Activation Scaling
Modern Large Language Models, such as the LLaMA, Qwen and DeepSeek series, predominantly adopt the Pre-LayerNorm (Pre-LN) Transformer architecture. While being stable during pretraining and scalable to large model sizes, Pre-LN suffers from an exponential growth in activation variance across layers, causing the shortcut to dominate over sub-layer outputs in the residual connection and limiting the learning capacity of deeper layers. To mitigate this issue, we propose Gradient-Preserving Activation Scaling (GPAS), a simple technique that can be used in combination with existing approaches. GPAS works by scaling down the intermediate activations while keeping their gradients unchanged. This leaves information in the activations intact, and avoids the gradient vanishing problem associated with gradient downscaling. Extensive experiments across various model sizes from 71M to 1B show that GPAS achieves consistent performance gains. Beyond enhancing Pre-LNTransformers, GPAS also shows promise in improving alternative architectures such as Sandwich-LN and DeepNorm, demonstrating its versatility and potential for improving training dynamics in a wide range of settings.
High-order Equivariant Flow Matching for Density Functional Theory Hamiltonian Prediction
Density functional theory (DFT) is a fundamental method for simulating quantum chemical properties, but it remains expensive due to the iterative self-consistent field (SCF) process required to solve the Kohn-Sham equations. Recently, deep learning methods are gaining attention as a way to bypass this step by directly predicting the Hamiltonian. However, they rely on deterministic regression and do not consider the highly structured nature of Hamiltonians. In this work, we propose QHFLOW, a high-order equivariant flow matching framework that generates Hamiltonian matrices conditioned on molecular geometry. Flow matching models continuous-time trajectories between simple priors and complex targets, learning the structured distributions over Hamiltonians instead of direct regression. To further incorporate symmetry, we use a neural architecture that predicts SE(3)-equivariant vector fields, improving accuracy and generalization across diverse geometries. To further enhance physical fidelity, we additionally introduce a fine-tuning scheme to align predicted orbital energies with the target. QHFLOW achieves state-of-the-art performance, reducing Hamiltonian error by 73% on MD17 and 53% on QH9 compared to the previous best model. Moreover, we further show that QHFLOW accelerates the DFT process without trading off the solution quality when initializing SCF iterations with the predicted Hamiltonian, significantly reducing the number of iterations and runtime.