Goto

Collaborating Authors

 Bayesian Inference


AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking

arXiv.org Artificial Intelligence

Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications. Due to the limit in context size and high inference cost of long context, reranking is typically performed over a fixed size of small subsets, with the final ranking aggregated from these partial results. This fixed computation disregards query difficulty and document distribution, leading to inefficiencies. We propose AcuRank, an adaptive reranking framework that dynamically adjusts both the amount and target of computation based on uncertainty estimates over document relevance. Using a Bayesian TrueSkill model, we iteratively refine relevance estimates until reaching sufficient confidence levels, and our explicit modeling of ranking uncertainty enables principled control over reranking behavior and avoids unnecessary updates to confident predictions. Results on the TREC-DL and BEIR benchmarks show that our method consistently achieves a superior accuracy-efficiency trade-off and scales better with compute than fixed-computation baselines. These results highlight the effectiveness and generalizability of our method across diverse retrieval tasks and LLM-based reranking models.


Surrogate-based quantification of policy uncertainty in generative flow networks

arXiv.org Machine Learning

Generative flow networks are able to sample, via sequential construction, high-reward, complex objects according to a reward function. However, such reward functions are often estimated approximately from noisy data, leading to epistemic uncertainty in the learnt policy. We present an approach to quantify this uncertainty by constructing a surrogate model composed of a polynomial chaos expansion, fit on a small ensemble of trained flow networks. This model learns the relationship between reward functions, parametrised in a low-dimensional space, and the probability distributions over actions at each step along a trajectory of the flow network. The surrogate model can then be used for inexpensive Monte Carlo sampling to estimate the uncertainty in the policy given uncertain rewards. We illustrate the performance of our approach on a discrete and continuous grid-world, symbolic regression, and a Bayesian structure learning task.


Certified Self-Consistency: Statistical Guarantees and Test-Time Training for Reliable Reasoning in LLMs

arXiv.org Machine Learning

Recent advances such as self-consistency and test-time reinforcement learning (TTRL) improve the reliability of large language models (LLMs) without additional supervision, yet their underlying mechanisms and statistical guarantees remain poorly understood. We present a unified framework for certifiable inference in LLMs, showing that majority voting provides a statistical certificate of self-consistency: under mild assumptions, the aggregated answer coincides with the mode of the model's terminal distribution with high probability. We derive finite-sample and anytime-valid concentration bounds that quantify this confidence, and introduce the Martingale Majority Certificate (MMC), a sequential stopping rule that adaptively determines when sufficient samples have been drawn. We further prove that label-free post-training methods such as TTRL implicitly sharpen the answer distribution by exponentially tilting it toward its mode, thereby reducing the number of samples required for certification. Building on this insight, we propose new post-training objectives that explicitly optimise this trade-off between sharpness and bias. Together, these results explain and connect two central test-time scaling strategies, self-consistency and TTRL, within a single statistical framework for label-free, certifiable reliability in reasoning LLMs.


Text Generation Beyond Discrete Token Sampling

arXiv.org Artificial Intelligence

In standard autoregressive generation, an LLM predicts the next-token distribution, samples a discrete token, and then discards the distribution, passing only the sampled token as new input. To preserve this distribution's rich information, we propose Mixture of Inputs (MoI), a training-free method for autoregressive generation. After generating a token following the standard paradigm, we construct a new input that blends the generated discrete token with the previously discarded token distribution. Specifically, we employ a Bayesian estimation method that treats the token distribution as the prior, the sampled token as the observation, and replaces the conventional one-hot vector with the continuous posterior expectation as the new model input. MoI allows the model to maintain a richer internal representation throughout the generation process, resulting in improved text quality and reasoning capabilities. On mathematical reasoning, code generation, and PhD-level QA tasks, MoI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B, with no additional training and negligible computational overhead.


Near optimal sample complexity for matrix and tensor normal models via geodesic convexity

arXiv.org Artificial Intelligence

The matrix normal model, i.e., the family of Gaussian matrix-variate distributions whose covariance matrices are the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor normal models. For the above models, we show that the maximum likelihood estimator (MLE) achieves nearly optimal nonasymptotic sample complexity and nearly tight error rates in the Fisher-Rao and Thompson metrics. In contrast to prior work, our results do not rely on the factors being well-conditioned or sparse, nor do we need to assume an accurate enough initial guess. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bounds for the largest factor and for overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that the flip-flop algorithm, a practical and widely used iterative procedure to compute the MLE, converges linearly with high probability. Our main technical insight is that, given enough samples, the negative log-likelihood function is strongly geodesically convex in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels.


Federated Learning via Meta-Variational Dropout

arXiv.org Artificial Intelligence

Federated Learning (FL) aims to train a global inference model from remotely distributed clients, gaining popularity due to its benefit of improving data privacy. However, traditional FL often faces challenges in practical applications, including model overfitting and divergent local models due to limited and non-IID data among clients. To address these issues, we introduce a novel Bayesian meta-learning approach called meta-variational dropout (MetaVD). MetaVD learns to predict client-dependent dropout rates via a shared hypernetwork, enabling effective model personalization of FL algorithms in limited non-IID data settings. We also emphasize the posterior adaptation view of meta-learning and the posterior aggregation view of Bayesian FL via the conditional dropout posterior. We conducted extensive experiments on various sparse and non-IID FL datasets. MetaVD demonstrated excellent classification accuracy and uncertainty calibration performance, especially for out-of-distribution (OOD) clients. MetaVD compresses the local model parameters needed for each client, mitigating model overfitting and reducing communication costs. Code is available at https://github.com/insujeon/MetaVD.


Bayes or Heisenberg: Who(se) Rules?

arXiv.org Artificial Intelligence

Although quantum systems are generally described by quantum state vectors, we show that in certain cases their measurement processes can be reformulated as probabilistic equations expressed in terms of probabilistic state vectors. These probabilistic representations can, in turn, be approximated by the neural network dynamics of the Tensor Brain (TB) model. The Tensor Brain is a recently proposed framework for modeling perception and memory in the brain, providing a biologically inspired mechanism for efficiently integrating generated symbolic representations into reasoning processes.


Neural Variational Dropout Processes

arXiv.org Artificial Intelligence

Learning to infer the conditional posterior model is a key step for robust meta-learning. This paper presents a new Bayesian meta-learning approach called Neural Variational Dropout Processes (NVDPs). NVDPs model the conditional posterior distribution based on a task-specific dropout; a low-rank product of Bernoulli experts meta-model is utilized for a memory-efficient mapping of dropout rates from a few observed contexts. It allows for a quick reconfiguration of a globally learned and shared neural network for new tasks in multi-task few-shot learning. In addition, NVDPs utilize a novel prior conditioned on the whole task data to optimize the conditional \textit{dropout} posterior in the amortized variational inference. Surprisingly, this enables the robust approximation of task-specific dropout rates that can deal with a wide range of functional ambiguities and uncertainties. We compared the proposed method with other meta-learning approaches in the few-shot learning tasks such as 1D stochastic regression, image inpainting, and classification. The results show the excellent performance of NVDPs.


Risk Assessment of an Autonomous Underwater Snake Robot in Confined Operations

arXiv.org Artificial Intelligence

The growing interest in ocean discovery imposes a need for inspection and intervention in confined and demanding environments. Eely's slender shape, in addition to its ability to change its body configurations, makes articulated underwater robots an adequate option for such environments. However, operation of Eely in such environments imposes demanding requirements on the system, as it must deal with uncertain and unstructured environments, extreme environmental conditions, and reduced navigational capabilities. This paper proposes a Bayesian approach to assess the risks of losing Eely during two mission scenarios. The goal of this work is to improve Eely's performance and the likelihood of mission success. Sensitivity analysis results are presented in order to demonstrate the causes having the highest impact on losing Eely.


No Intelligence Without Statistics: The Invisible Backbone of Artificial Intelligence

arXiv.org Artificial Intelligence

The rapid ascent of artificial intelligence (AI) is often portrayed as a revolution born from computer science and engineering. This narrative, however, obscures a fundamental truth: the theoretical and methodological core of AI is, and has always been, statistical. This paper systematically argues that the field of statistics provides the indispensable foundation for machine learning and modern AI. We deconstruct AI into nine foundational pillars-Inference, Density Estimation, Sequential Learning, Generalization, Representation Learning, Interpretability, Causality, Optimization, and Unification-demonstrating that each is built upon century-old statistical principles. From the inferential frameworks of hypothesis testing and estimation that underpin model evaluation, to the density estimation roots of clustering and generative AI; from the time-series analysis inspiring recurrent networks to the causal models that promise true understanding, we trace an unbroken statistical lineage. While celebrating the computational engines that power modern AI, we contend that statistics provides the brain-the theoretical frameworks, uncertainty quantification, and inferential goals-while computer science provides the brawn-the scalable algorithms and hardware. Recognizing this statistical backbone is not merely an academic exercise, but a necessary step for developing more robust, interpretable, and trustworthy intelligent systems. We issue a call to action for education, research, and practice to re-embrace this statistical foundation. Ignoring these roots risks building a fragile future; embracing them is the path to truly intelligent machines. There is no machine learning without statistical learning; no artificial intelligence without statistical thought.