Goto

Collaborating Authors

 Bayesian Inference


Mitigating mode collapse in normalizing flows by annealing with an adaptive schedule: Application to parameter estimation

arXiv.org Machine Learning

Normalizing flows (NFs) provide uncorrelated samples from complex distributions, making them an appealing tool for parameter estimation. However, the practical utility of NFs remains limited by their tendency to collapse to a single mode of a multimodal distribution. In this study, we show that annealing with an adaptive schedule based on the effective sample size (ESS) can mitigate mode collapse. We demonstrate that our approach can converge the marginal likelihood for a biochemical oscillator model fit to time-series data in ten-fold less computation time than a widely used ensemble Markov chain Monte Carlo (MCMC) method. We show that the ESS can also be used to reduce variance by pruning the samples. We expect these developments to be of general use for sampling with NFs and discuss potential opportunities for further improvements.


Modeling Spatial Extremes using Non-Gaussian Spatial Autoregressive Models via Convolutional Neural Networks

arXiv.org Machine Learning

Data derived from remote sensing or numerical simulations often have a regular gridded structure and are large in volume, making it challenging to find accurate spatial models that can fill in missing grid cells or simulate the process effectively, especially in the presence of spatial heterogeneity and heavy-tailed marginal distributions. To overcome this issue, we present a spatial autoregressive modeling framework, which maps observations at a location and its neighbors to independent random variables. This is a highly flexible modeling approach and well-suited for non-Gaussian fields, providing simpler interpretability. In particular, we consider the SAR model with Generalized Extreme Value distribution innovations to combine the observation at a central grid location with its neighbors, capturing extreme spatial behavior based on the heavy-tailed innovations. While these models are fast to simulate by exploiting the sparsity of the key matrices in the computations, the maximum likelihood estimation of the parameters is prohibitive due to the intractability of the likelihood, making optimization challenging. To overcome this, we train a convolutional neural network on a large training set that covers a useful parameter space, and then use the trained network for fast parameter estimation. Finally, we apply this model to analyze annual maximum precipitation data from ERA-Interim-driven Weather Research and Forecasting (WRF) simulations, allowing us to explore its spatial extreme behavior across North America.


Nonparametric learning of covariate-based Markov jump processes using RKHS techniques

arXiv.org Machine Learning

We propose a novel nonparametric approach for linking covariates to Continuous Time Markov Chains (CTMCs) using the mathematical framework of Reproducing Kernel Hilbert Spaces (RKHS). CTMCs provide a robust framework for modeling transitions across clinical or behavioral states, but traditional multistate models often rely on linear relationships. In contrast, we use a generalized Representer Theorem to enable tractable inference in functional space. For the Frequen-tist version, we apply normed square penalties, while for the Bayesian version, we explore sparsity inducing spike and slab priors. Due to the computational challenges posed by high-dimensional spaces, we successfully adapt the Expectation Maximization Variable Selection (EMVS) algorithm to efficiently identify the posterior mode. We demonstrate the effectiveness of our method through extensive simulation studies and an application to follicular cell lymphoma data. Our performance metrics include the normalized difference between estimated and true nonlinear transition functions, as well as the difference in the probability of getting absorbed in one the final states, capturing the ability of our approach to predict long-term behaviors.


Decision Making under Model Misspecification: DRO with Robust Bayesian Ambiguity Sets

arXiv.org Machine Learning

Distributionally Robust Optimisation (DRO) protects risk-averse decision-makers by considering the worst-case risk within an ambiguity set of distributions based on the empirical distribution or a model. To further guard against finite, noisy data, model-based approaches admit Bayesian formulations that propagate uncertainty from the posterior to the decision-making problem. However, when the model is misspecified, the decision maker must stretch the ambiguity set to contain the data-generating process (DGP), leading to overly conservative decisions. We address this challenge by introducing DRO with Robust, to model misspecification, Bayesian Ambiguity Sets (DRO-RoBAS). These are Maximum Mean Discrepancy ambiguity sets centred at a robust posterior predictive distribution that incorporates beliefs about the DGP. We show that the resulting optimisation problem obtains a dual formulation in the Reproducing Kernel Hilbert Space and we give probabilistic guarantees on the tolerance level of the ambiguity set. Our method outperforms other Bayesian and empirical DRO approaches in out-of-sample performance on the Newsvendor and Portfolio problems with various cases of model misspecification.


A Symbolic and Statistical Learning Framework to Discover Bioprocessing Regulatory Mechanism: Cell Culture Example

arXiv.org Machine Learning

Bioprocess mechanistic modeling is essential for advancing intelligent digital twin representation of biomanufacturing, yet challenges persist due to complex intracellular regulation, stochastic system behavior, and limited experimental data. This paper introduces a symbolic and statistical learning framework to identify key regulatory mechanisms and quantify model uncertainty. Bioprocess dynamics is formulated with stochastic differential equations characterizing intrinsic process variability, with a predefined set of candidate regulatory mechanisms constructed from biological knowledge. A Bayesian learning approach is developed, which is based on a joint learning of kinetic parameters and regulatory structure through a formulation of the mixture model. To enhance computational efficiency, a Metropolis-adjusted Langevin algorithm with adjoint sensitivity analysis is developed for posterior exploration. Compared to state-of-the-art Bayesian inference approaches, the proposed framework achieves improved sample efficiency and robust model selection. An empirical study demonstrates its ability to recover missing regulatory mechanisms and improve model fidelity under data-limited conditions.


Uncertainty Quantification for Machine Learning in Healthcare: A Survey

arXiv.org Artificial Intelligence

Uncertainty Quantification (UQ) is pivotal in enhancing the robustness, reliability, and interpretability of Machine Learning (ML) systems for healthcare, optimizing resources and improving patient care. Despite the emergence of ML-based clinical decision support tools, the lack of principled quantification of uncertainty in ML models remains a major challenge. Current reviews have a narrow focus on analyzing the state-of-the-art UQ in specific healthcare domains without systematically evaluating method efficacy across different stages of model development, and despite a growing body of research, its implementation in healthcare applications remains limited. Therefore, in this survey, we provide a comprehensive analysis of current UQ in healthcare, offering an informed framework that highlights how different methods can be integrated into each stage of the ML pipeline including data processing, training and evaluation. We also highlight the most popular methods used in healthcare and novel approaches from other domains that hold potential for future adoption in the medical context. We expect this study will provide a clear overview of the challenges and opportunities of implementing UQ in the ML pipeline for healthcare, guiding researchers and practitioners in selecting suitable techniques to enhance the reliability, safety and trust from patients and clinicians on ML-driven healthcare solutions.


Autonomous Cooperative Transportation System involving Multi-Aerial Robots with Variable Attachment Mechanism

arXiv.org Artificial Intelligence

Cooperative transportation by multi-aerial robots has the potential to support various payloads and improve failsafe against dropping. Furthermore, changing the attachment positions of robots according payload characteristics increases the stability of transportation. However, there are almost no transportation systems capable of scaling to the payload weight and size and changing the optimal attachment positions. To address this issue, we propose a cooperative transportation system comprising autonomously executable software and suitable hardware, covering the entire process, from pre-takeoff setting to controlled flight. The proposed system decides the formation of the attachment positions by prioritizing controllability based on the center of gravity obtained from Bayesian estimations with robot pairs. We investigated the cooperative transportation of an unknown payload larger than that of whole carrier robots through numerical simulations. Furthermore, we performed cooperative transportation of an unknown payload (with a weight of about 3.2 kg and maximum length of 1.76 m) using eight robots. The proposed system was found to be versatile with regard to handling unknown payloads with different shapes and center-of-gravity positions.


Bayesian learning of the optimal action-value function in a Markov decision process

arXiv.org Machine Learning

The Markov Decision Process (MDP) is a popular framework for sequential decision-making problems, and uncertainty quantification is an essential component of it to learn optimal decision-making strategies. In particular, a Bayesian framework is used to maintain beliefs about the optimal decisions and the unknown ingredients of the model, which are also to be learned from the data, such as the rewards and state dynamics. However, many existing Bayesian approaches for learning the optimal decision-making strategy are based on unrealistic modelling assumptions and utilise approximate inference techniques. This raises doubts whether the benefits of Bayesian uncertainty quantification are fully realised or can be relied upon. We focus on infinite-horizon and undiscounted MDPs, with finite state and action spaces, and a terminal state. We provide a full Bayesian framework, from modelling to inference to decision-making. For modelling, we introduce a likelihood function with minimal assumptions for learning the optimal action-value function based on Bellman's optimality equations, analyse its properties, and clarify connections to existing works. For deterministic rewards, the likelihood is degenerate and we introduce artificial observation noise to relax it, in a controlled manner, to facilitate more efficient Monte Carlo-based inference. For inference, we propose an adaptive sequential Monte Carlo algorithm to both sample from and adjust the sequence of relaxed posterior distributions. For decision-making, we choose actions using samples from the posterior distribution over the optimal strategies. While commonly done, we provide new insight that clearly shows that it is a generalisation of Thompson sampling from multi-arm bandit problems. Finally, we evaluate our framework on the Deep Sea benchmark problem and demonstrate the exploration benefits of posterior sampling in MDPs.


Latent Variable Estimation in Bayesian Black-Litterman Models

arXiv.org Machine Learning

We revisit the Bayesian Black-Litterman (BL) portfolio model and remove its reliance on subjective investor views. Classical BL requires an investor "view": a forecast vector $q$ and its uncertainty matrix $Ω$ that describe how much a chosen portfolio should outperform the market. Our key idea is to treat $(q,Ω)$ as latent variables and learn them from market data within a single Bayesian network. Consequently, the resulting posterior estimation admits closed-form expression, enabling fast inference and stable portfolio weights. Building on these, we propose two mechanisms to capture how features interact with returns: shared-latent parametrization and feature-influenced views; both recover classical BL and Markowitz portfolios as special cases. Empirically, on 30-year Dow-Jones and 20-year sector-ETF data, we improve Sharpe ratios by 50% and cut turnover by 55% relative to Markowitz and the index baselines. This work turns BL into a fully data-driven, view-free, and coherent Bayesian framework for portfolio optimization.


Data-driven Approach for Interpolation of Sparse Data

arXiv.org Machine Learning

Extracting information about hadron resonances requires fitting theoretical models to experimental data. However, this data often comes from different experiments of different physics quantities in varying kinematic regions; studying coupled channels with different kinematic coverages and binning can make direct comparison challenging. The consistency of these datasets directly impacts the quality of the fit, thus making it difficult to accurately constrain the theoretical models. Sparse datasets in key kinematic regions further complicates the quantification of uncertainties, often requiring arbitrary weighting that may introduce bias. A robust approach to solving these problems involves utilising Gaussian Processes (GPs), a Bayesian inference machine learning technique that provides probabilistic predictions for unknown datapoints. Unlike traditional machine learning methods, GPs do not require any training; instead, they operate on three fundamental assumptions: 1. Some kernel function can be defined to measure the covariance between known datapoints; 2. This same kernel function can be used to predict the covariance between unknown datapoints; 3. Some idea of the form of the posterior distribution is known (e.g.