Uncertainty
Best-of-$\infty$ -- Asymptotic Performance of Test-Time Compute
Komiyama, Junpei, Oba, Daisuke, Oyamada, Masafumi
We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as Best-of-$\infty$. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.
Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
Constraint-based causal discovery relies on numerous conditional independence tests (CITs), but its practical applicability is severely constrained by the prohibitive computational cost, especially as CITs themselves have high time complexity with respect to the sample size. To address this key bottleneck, we introduce the Ensemble Conditional Independence Test (E-CIT), a general and plug-and-play framework. E-CIT operates on an intuitive divide-and-aggregate strategy: it partitions the data into subsets, applies a given base CIT independently to each subset, and aggregates the resulting p-values using a novel method grounded in the properties of stable distributions. This framework reduces the computational complexity of a base CIT to linear in the sample size when the subset size is fixed. Moreover, our tailored p-value combination method offers theoretical consistency guarantees under mild conditions on the subtests. Experimental results demonstrate that E-CIT not only significantly reduces the computational burden of CITs and causal discovery but also achieves competitive performance. Notably, it exhibits an improvement in complex testing scenarios, particularly on real-world datasets.
Learning Ising Models under Hard Constraints using One Sample
Chauhan, Rohan, Panageas, Ioannis
We consider the problem of estimating inverse temperature parameter $ฮฒ$ of an $n$-dimensional truncated Ising model using a single sample. Given a graph $G = (V,E)$ with $n$ vertices, a truncated Ising model is a probability distribution over the $n$-dimensional hypercube $\{-1,1\}^n$ where each configuration $\mathbfฯ$ is constrained to lie in a truncation set $S \subseteq \{-1,1\}^n$ and has probability $\Pr(\mathbfฯ) \propto \exp(ฮฒ\mathbfฯ^\top A\mathbfฯ)$ with $A$ being the adjacency matrix of $G$. We adopt the recent setting of [Galanis et al. SODA'24], where the truncation set $S$ can be expressed as the set of satisfying assignments of a $k$-SAT formula. Given a single sample $\mathbfฯ$ from a truncated Ising model, with inverse parameter $ฮฒ^*$, underlying graph $G$ of bounded degree $ฮ$ and $S$ being expressed as the set of satisfying assignments of a $k$-SAT formula, we design in nearly $O(n)$ time an estimator $\hatฮฒ$ that is $O(ฮ^3/\sqrt{n})$-consistent with the true parameter $ฮฒ^*$ for $k \gtrsim \log(d^2k)ฮ^3.$ Our estimator is based on the maximization of the pseudolikelihood, a notion that has received extensive analysis for various probabilistic models without [Chatterjee, Annals of Statistics '07] or with truncation [Galanis et al. SODA '24]. Our approach generalizes recent techniques from [Daskalakis et al. STOC '19, Galanis et al. SODA '24], to confront the more challenging setting of the truncated Ising model.
RAPTOR-GEN: RApid PosTeriOR GENerator for Bayesian Learning in Biomanufacturing
Biopharmaceutical manufacturing is vital to public health but lacks the agility for rapid, on-demand production of biotherapeutics due to the complexity and variability of bioprocesses. T o overcome this, we introduce RApid PosT eriOR GENerator (RAPTOR-GEN), a mechanism-informed Bayesian learning framework designed to accelerate intelligent digital twin development from sparse and heterogeneous experimental data. This framework is built on a multi-scale probabilistic knowledge graph (pKG), formulated as a stochastic differential equation (SDE)-based foundational model that captures the nonlinear dynamics of bioprocesses. RAPTOR-GEN consists of two ingredients: (i) an interpretable metamodel integrating linear noise approximation (LNA) that exploits the structural information of bioprocessing mechanisms and a sequential learning strategy to fuse heterogeneous and sparse data, enabling inference of latent state variables and explicit approximation of the intractable likelihood function; and (ii) an efficient Bayesian posterior sampling method that utilizes Langevin diffusion (LD) to accelerate posterior exploration by exploiting the gradients of the derived likelihood. It generalizes the LNA approach to circumvent the challenge of step size selection, facilitating robust learning of mechanistic parameters with provable finite-sample performance guarantees. We develop a fast and robust RAPTOR-GEN algorithm with controllable error. Numerical experiments demonstrate its effectiveness in uncovering the underlying regulatory mechanisms of biomanufacturing processes. Funding: This research was supported by National Science Foundation Grant CAREER CMMI-2442970 and National Institute of Standards and T echnology Grant 70NANB21H086.
A Hierarchical Variational Graph Fused Lasso for Recovering Relative Rates in Spatial Compositional Data
Teixeira, Joaquim Valerio, Reznik, Ed, Banerjee, Sudpito, Tansey, Wesley
The analysis of spatial data from biological imaging technology, such as imaging mass spectrometry (IMS) or imaging mass cytometry (IMC), is challenging because of a competitive sampling process which convolves signals from molecules in a single pixel. To address this, we develop a scalable Bayesian framework that leverages natural sparsity in spatial signal patterns to recover relative rates for each molecule across the entire image. Our method relies on the use of a heavy-tailed variant of the graphical lasso prior and a novel hierarchical variational family, enabling efficient inference via automatic differentiation variational inference. Simulation results show that our approach outperforms state-of-the-practice point estimate methodologies in IMS, and has superior posterior coverage than mean-field variational inference techniques. Results on real IMS data demonstrate that our approach better recovers the true anatomical structure of known tissue, removes artifacts, and detects active regions missed by the standard analysis approach.
The Sensitivity of Variational Bayesian Neural Network Performance to Hyperparameters
Koermer, Scott, Klein, Natalie
In scientific applications, predictive modeling is often of limited use without accurate uncertainty quantification (UQ) to indicate when a model may be extrapolating or when more data needs to be collected. Bayesian Neural Networks (BNNs) produce predictive uncertainty by propagating uncertainty in neural network (NN) weights and offer the promise of obtaining not only an accurate predictive model but also accurate UQ. However, in practice, obtaining accurate UQ with BNNs is difficult due in part to the approximations used for practical model training and in part to the need to choose a suitable set of hyperparameters; these hyperparameters outnumber those needed for traditional NNs and often have opaque effects on the results. We aim to shed light on the effects of hyperparameter choices for BNNs by performing a global sensitivity analysis of BNN performance under varying hyperparameter settings. Our results indicate that many of the hyperparameters interact with each other to affect both predictive accuracy and UQ. For improved usage of BNNs in real-world applications, we suggest that global sensitivity analysis, or related methods such as Bayesian optimization, should be used to aid in dimensionality reduction and selection of hyperparameters to ensure accurate UQ in BNNs.
A Compound Classification System Based on Fuzzy Relations Applied to the Noise-Tolerant Control of a Bionic Hand via EMG Signal Recognition
Trajdos, Pawel, Kurzynski, Marek
Modern anthropomorphic upper limb bioprostheses are typically controlled by electromyographic (EMG) biosignals using a pattern recognition scheme. Unfortunately, there are many factors originating from the human source of objects to be classified and from the human-prosthesis interface that make it difficult to obtain an acceptable classification quality. One of these factors is the high susceptibility of biosignals to contamination, which can considerably reduce the quality of classification of a recognition system. In the paper, the authors propose a new recognition system intended for EMG based control of the hand prosthesis with detection of contaminated biosignals in order to mitigate the adverse effect of contaminations. The system consists of two ensembles: the set of one-class classifiers (OCC) to assess the degree of contamination of individual channels and the ensemble of K-nearest neighbours (KNN) classifier to recognise the patient's intent. For all recognition systems, an original, coherent fuzzy model was developed, which allows the use of a uniform soft (fuzzy) decision scheme throughout the recognition process. The experimental evaluation was conducted using real biosignals from a public repository. The goal was to provide an experimental comparative analysis of the parameters and procedures of the developed method on which the quality of the recognition system depends. The proposed fuzzy recognition system was also compared with similar systems described in the literature.
Normalizing Flows are Capable Visuomotor Policy Learning Models
Lind, Simon Kristoffersson, Li, Jialong, Stenmark, Maj, Krรผger, Volker
Abstract-- The field of general purpose robotics has recently embraced powerful probabilistic models, such as diffusion models, to model and learn complex behaviors. However, these models often come with significant trade-offs, namely high computational costs for inference and a fundamental inability to quantify output uncertainty. We argue that a model's trustworthiness, a critical factor for reliable, general-purpose robotics, is inherently linked to its ability to provide confidence measures. In this work, we introduce Normalizing Flows Policy, a novel visuomotor policy learning model based on Normalizing Flows. We show that Normalizing Flows are a natural and powerful alternative to diffusion models, providing both a statistically sound measure of confidence and a highly efficient inference process. Through comprehensive experiments across four distinct simulated robotic tasks, we demonstrate that Normalizing Flows Policy achieves performance comparable to, and often surpassing, Diffusion Policy, and it does so not only with improved sample efficiency but also with up to 30 times faster inference. Additionally, our ablation study validates several key architectural and training techniques that enable Normalizing Flows to perform well in this domain.
Philosophy-informed Machine Learning
A deep dive into the open literature shows that there are t hree fundamental limitations to current ML approaches, namely blackbox brittleness (which renders models uninterpretable and unreliable under distribution shift [2]), causal blindness (which conflates correlation with causation [3]), and alignment failures (which produce systems optimizing objectives misaligned with human values [4]) . These deficiencies stem from a profound philosophical poverty in how ML conceptualizes knowledge, reasoning, and values. The first fundamental limitation, b lackbox brittleness, manifests when trained models fail on seemingly trivial variations of their training distribution. For example, a vision model that accurately identifies stop signs under normal conditions might misclassify them entirely when small adversarial perturbations are applied [5] . Not surprisingly, t h e same brittleness extends beyond adversarial examples to everyday distribution shifts (e.g., natural language processing models exhibit performance degradation when processing text from different cultural contexts, etc.) [6] .