Goto

Collaborating Authors

 Bayesian Inference


Wasserstein-Cramér-Rao Theory of Unbiased Estimation

arXiv.org Machine Learning

The quantity of interest in the classical Cramér-Rao theory of unbiased estimation (e.g., the Cramér-Rao lower bound, its exact attainment for exponential families, and asymptotic efficiency of maximum likelihood estimation) is the variance, which represents the instability of an estimator when its value is compared to the value for an independently-sampled data set from the same distribution. In this paper we are interested in a quantity which represents the instability of an estimator when its value is compared to the value for an infinitesimal additive perturbation of the original data set; we refer to this as the "sensitivity" of an estimator. The resulting theory of sensitivity is based on the Wasserstein geometry in the same way that the classical theory of variance is based on the Fisher-Rao (equivalently, Hellinger) geometry, and this insight allows us to determine a collection of results which are analogous to the classical case: a Wasserstein-Cramér-Rao lower bound for the sensitivity of any unbiased estimator, a characterization of models in which there exist unbiased estimators achieving the lower bound exactly, and some concrete results that show that the Wasserstein projection estimator achieves the lower bound asymptotically. We use these results to treat many statistical examples, sometimes revealing new optimality properties for existing estimators and other times revealing entirely new estimators.


A Latent-Variable Formulation of the Poisson Canonical Polyadic Tensor Model: Maximum Likelihood Estimation and Fisher Information

arXiv.org Machine Learning

We establish parameter inference for the Poisson canonical polyadic (PCP) tensor model through a latent-variable formulation. Our approach exploits the observation that any random PCP tensor can be derived by marginalizing an unobservable random tensor of one dimension larger. The loglikelihood of this larger dimensional tensor, referred to as the "complete" loglikelihood, is comprised of multiple rank one PCP loglikelihoods. Using this methodology, we first derive non-iterative maximum likelihood estimators for the PCP model and demonstrate that several existing algorithms for fitting non-negative matrix and tensor factorizations are Expectation-Maximization algorithms. Next, we derive the observed and expected Fisher information matrices for the PCP model. The Fisher information provides us crucial insights into the well-posedness of the tensor model, such as the role that tensor rank plays in identifiability and indeterminacy. For the special case of rank one PCP models, we demonstrate that these results are greatly simplified.


Epistemic Reject Option Prediction

arXiv.org Artificial Intelligence

In high-stakes applications, predictive models must not only produce accurate predictions but also quantify and communicate their uncertainty. Reject-option prediction addresses this by allowing the model to abstain when prediction uncertainty is high. Traditional reject-option approaches focus solely on aleatoric uncertainty, an assumption valid only when large training data makes the epistemic uncertainty negligible. However, in many practical scenarios, limited data makes this assumption unrealistic. This paper introduces the epistemic reject-option predictor, which abstains in regions of high epistemic uncertainty caused by insufficient data. Building on Bayesian learning, we redefine the optimal predictor as the one that minimizes expected regret -- the performance gap between the learned model and the Bayes-optimal predictor with full knowledge of the data distribution. The model abstains when the regret for a given input exceeds a specified rejection cost. To our knowledge, this is the first principled framework that enables learning predictors capable of identifying inputs for which the training data is insufficient to make reliable decisions.


Learning Dynamics from Input-Output Data with Hamiltonian Gaussian Processes

arXiv.org Artificial Intelligence

Embedding non-restrictive prior knowledge, such as energy conservation laws, in learning-based approaches is a key motive to construct physically consistent models from limited data, relevant for, e.g., model-based control. Recent work incorporates Hamiltonian dynamics into Gaussian Process (GP) regression to obtain uncertainty-quantifying models that adhere to the underlying physical principles. However, these works rely on velocity or momentum data, which is rarely available in practice. In this paper, we consider dynamics learning with non-conservative Hamiltonian GPs, and address the more realistic problem setting of learning from input-output data. We provide a fully Bayesian scheme for estimating probability densities of unknown hidden states, of GP hyperparameters, as well as of structural hyperparameters, such as damping coefficients. Considering the computational complexity of GPs, we take advantage of a reduced-rank GP approximation and leverage its properties for computationally efficient prediction and training. The proposed method is evaluated in a nonlinear simulation case study and compared to a state-of-the-art approach that relies on momentum measurements.


DL101 Neural Network Outputs and Loss Functions

arXiv.org Artificial Intelligence

The loss function used to train a neural network is strongly connected to its output layer from a statistical point of view. This technical report analyzes common activation functions for a neural network output layer, like linear, sigmoid, ReLU, and softmax, detailing their mathematical properties and their appropriate use cases. A strong statistical justification exists for the selection of the suitable loss function for training a deep learning model. This report connects common loss functions such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and various Cross-Entropy losses to the statistical principle of Maximum Likelihood Estimation (MLE). Choosing a specific loss function is equivalent to assuming a specific probability distribution for the model output, highlighting the link between these functions and the Generalized Linear Models (GLMs) that underlie network output layers. Additional scenarios of practical interest are also considered, such as alternative output encodings, constrained outputs, and distributions with heavy tails.


ReNiL: Event-Driven Pedestrian Bayesian Localization Using IMU for Real-World Applications

arXiv.org Artificial Intelligence

Pedestrian inertial localization is key for mobile and IoT services because it provides infrastructure-free positioning. Yet most learning-based methods depend on fixed sliding-window integration, struggle to adapt to diverse motion scales and cadences, and yield inconsistent uncertainty, limiting real-world use. We present ReNiL, a Bayesian deep-learning framework for accurate, efficient, and uncertainty-aware pedestrian localization. ReNiL introduces Inertial Positioning Demand Points (IPDPs) to estimate motion at contextually meaningful waypoints instead of dense tracking, and supports inference on IMU sequences at any scale so cadence can match application needs. It couples a motion-aware orientation filter with an Any-Scale Laplace Estimator (ASLE), a dual-task network that blends patch-based self-supervision with Bayesian regression. By modeling displacements with a Laplace distribution, ReNiL provides homogeneous Euclidean uncertainty that integrates cleanly with other sensors. A Bayesian inference chain links successive IPDPs into consistent trajectories. On RoNIN-ds and a new WUDataset covering indoor and outdoor motion from 28 participants, ReNiL achieves state-of-the-art displacement accuracy and uncertainty consistency, outperforming TLIO, CTIN, iMoT, and RoNIN variants while reducing computation. Application studies further show robustness and practicality for mobile and IoT localization, making ReNiL a scalable, uncertainty-aware foundation for next-generation positioning.


Forgetting is Everywhere

arXiv.org Machine Learning

A fundamental challenge in developing general learning algorithms is their tendency to forget past knowledge when adapting to new data. Addressing this problem requires a principled understanding of forgetting; yet, despite decades of study, no unified definition has emerged that provides insights into the underlying dynamics of learning. We propose an algorithm- and task-agnostic theory that characterises forgetting as a lack of self-consistency in a learner's predictive distribution over future experiences, manifesting as a loss of predictive information. Our theory naturally yields a general measure of an algorithm's propensity to forget. To validate the theory, we design a comprehensive set of experiments that span classification, regression, generative modelling, and reinforcement learning. We empirically demonstrate how forgetting is present across all learning settings and plays a significant role in determining learning efficiency. Together, these results establish a principled understanding of forgetting and lay the foundation for analysing and improving the information retention capabilities of general learning algorithms.


LUME-DBN: Full Bayesian Learning of DBNs from Incomplete data in Intensive Care

arXiv.org Artificial Intelligence

Dynamic Bayesian networks (DBNs) are increasingly used in healthcare due to their ability to model complex temporal relationships in patient data while maintaining interpretability, an essential feature for clinical decision-making. However, existing approaches to handling missing data in longitudinal clinical datasets are largely derived from static Bayesian networks literature, failing to properly account for the temporal nature of the data. This gap limits the ability to quantify uncertainty over time, which is particularly critical in settings such as intensive care, where understanding the temporal dynamics is fundamental for model trustworthiness and applicability across diverse patient groups. Despite the potential of DBNs, a full Bayesian framework that integrates missing data handling remains underdeveloped. In this work, we propose a novel Gibbs sampling-based method for learning DBNs from incomplete data. Our method treats each missing value as an unknown parameter following a Gaussian distribution. At each iteration, the unobserved values are sampled from their full conditional distributions, allowing for principled imputation and uncertainty estimation. We evaluate our method on both simulated datasets and real-world intensive care data from critically ill patients. Compared to standard model-agnostic techniques such as MICE, our Bayesian approach demonstrates superior reconstruction accuracy and convergence properties. These results highlight the clinical relevance of incorporating full Bayesian inference in temporal models, providing more reliable imputations and offering deeper insight into model behavior. Our approach supports safer and more informed clinical decision-making, particularly in settings where missing data are frequent and potentially impactful.


Rater Equivalence: Evaluating Classifiers in Human Judgment Settings

arXiv.org Artificial Intelligence

In many decision settings, the definitive ground truth is either non-existent or inaccessible. We introduce a framework for evaluating classifiers based solely on human judgments. In such cases, it is helpful to compare automated classifiers to human judgment. We quantify a classifier's performance by its rater equivalence: the smallest number of human raters whose combined judgment matches the classifier's performance. Our framework uses human-generated labels both to construct benchmark panels and to evaluate performance. We distinguish between two models of utility: one based on agreement with the assumed but inaccessible ground truth, and one based on matching individual human judgments. Using case studies and formal analysis, we demonstrate how this framework can inform the evaluation and deployment of AI systems in practice.


Generative Bayesian Filtering and Parameter Learning

arXiv.org Machine Learning

Generative Bayesian Filtering (GBF) provides a powerful and flexible framework for performing posterior inference in complex nonlinear and non-Gaussian state-space models. Our approach extends Generative Bayesian Computation (GBC) to dynamic settings, enabling recursive posterior inference using simulation-based methods powered by deep neural networks. GBF does not require explicit density evaluations, making it particularly effective when observation or transition distributions are analytically intractable. To address parameter learning, we introduce the Generative-Gibbs sampler, which bypasses explicit density evaluation by iteratively sampling each variable from its implicit full conditional distribution. Such technique is broadly applicable and enables inference in hierarchical Bayesian models with intractable densities, including state-space models. We assess the performance of the proposed methodologies through both simulated and empirical studies, including the estimation of $α$-stable stochastic volatility models. Our findings indicate that GBF significantly outperforms existing likelihood-free approaches in accuracy and robustness when dealing with intractable state-space models.