Goto

Collaborating Authors

 Bayesian Inference


Credal Two-Sample Tests of Epistemic Ignorance

arXiv.org Machine Learning

Science is inherently inductive and thus involves uncertainties. They are commonly categorized as aleatoric uncertainty (AU), which refers to inherent variability, and epistemic uncertainty (EU), arising from limited information such as finite data or model assumptions (Hora, 1996). These uncertainties often overlap, as scientists may be epistemically uncertain about the aleatoric variation in their inquiry. Distinguishing and acknowledging them is crucial for the safe and trustworthy deployment of intelligent systems (Kendall and Gal, 2017; Hüllermeier and Waegeman, 2021), as they lead to different down-stream decisions. For example, experimental design aims to reduce EU (Nguyen et al., 2019; Chau et al., 2021b; Adachi et al., 2024), while risk management uses hedging strategy to address AU (Mashrur et al., 2020) While AU is often modelled using probability distributions, modelling EU--particularly in states of epistemic ignorance, also known as partial ignorance or incomplete knowledge (Dubois et al., 1996)--poses greater challenges. For instance, a scientist analysing insulin levels in Germany may have data from multiple hospitals, each representing aleatoric variation as a probability distribution. However, these distributions are merely proxies for the population-level insulin distribution, which is difficult to infer due to data collection limitations. A Bayesian approach could aggregate the data based on a prior if the representativeness of each source is known, but in many cases, scientists operate under partial ignorance, lacking such prior information (Bromberger, 1971). Assigning a uniform prior by following the principle of indifference (Keynes, 1921) and maximum entropy principle (Jaynes, 1957), or applying Jeffrey's prior by following the principle of transformation groups (Jaynes, 1968) only reflects indifference, not epistemic ignorance.


Training Neural Samplers with Reverse Diffusive KL Divergence

arXiv.org Machine Learning

Training generative models to sample from unnormalized density functions is an important and challenging task in machine learning. Traditional training methods often rely on the reverse Kullback-Leibler (KL) divergence due to its tractability. However, the mode-seeking behavior of reverse KL hinders effective approximation of multi-modal target distributions. To address this, we propose to minimize the reverse KL along diffusion trajectories of both model and target densities. We refer to this objective as the reverse diffusive KL divergence, which allows the model to capture multiple modes. Leveraging this objective, we train neural samplers that can efficiently generate samples from the target distribution in one step. We demonstrate that our method enhances sampling performance across various Boltzmann distributions, including both synthetic multi-modal densities and n-body particle systems.


Preferential Normalizing Flows

arXiv.org Machine Learning

Eliciting a high-dimensional probability distribution from an expert via noisy judgments is notoriously challenging, yet useful for many applications, such as prior elicitation and reward modeling. We introduce a method for eliciting the expert's belief density as a normalizing flow based solely on preferential questions such as comparing or ranking alternatives. This allows eliciting in principle arbitrarily flexible densities, but flow estimation is susceptible to the challenge of collapsing or diverging probability mass that makes it difficult in practice. We tackle this problem by introducing a novel functional prior for the flow, motivated by a decision-theoretic argument, and show empirically that the belief density can be inferred as the function-space maximum a posteriori estimate. We demonstrate our method by eliciting multivariate belief densities of simulated experts, including the prior belief of a general-purpose large language model over a real-world dataset.


Conjugate Bayesian Two-step Change Point Detection for Hawkes Process

arXiv.org Machine Learning

The Bayesian two-step change point detection method is popular for the Hawkes process due to its simplicity and intuitiveness. However, the non-conjugacy between the point process likelihood and the prior requires most existing Bayesian two-step change point detection methods to rely on non-conjugate inference methods. These methods lack analytical expressions, leading to low computational efficiency and impeding timely change point detection. To address this issue, this work employs data augmentation to propose a conjugate Bayesian two-step change point detection method for the Hawkes process, which proves to be more accurate and efficient. Extensive experiments on both synthetic and real data demonstrate the superior effectiveness and efficiency of our method compared to baseline methods. Additionally, we conduct ablation studies to explore the robustness of our method concerning various hyperparameters.


Nonlinear Gaussian process tomography with imposed non-negativity constraints on physical quantities for plasma diagnostics

arXiv.org Artificial Intelligence

We propose a novel tomographic method, nonlinear Gaussian process tomography (nonlinear GPT) that employs the Laplace approximation to ensure the non-negative physical quantity, such as the emissivity of plasma optical diagnostics. This new method implements a logarithmic Gaussian process (log-GP) to model plasma distribution more naturally, thereby expanding the limitations of standard GPT, which are restricted to linear problems and may yield non-physical negative values. The effectiveness of the proposed log-GP tomography is demonstrated through a case study using the Ring Trap 1 (RT-1) device, where log-GPT outperforms existing methods, standard GPT, and the Minimum Fisher Information (MFI) methods in terms of reconstruction accuracy. The result highlights the effectiveness of nonlinear GPT for imposing physical constraints in applications to an inverse problem.


Conditional Density Estimation with Histogram Trees

arXiv.org Artificial Intelligence

Conditional density estimation (CDE) goes beyond regression by modeling the full conditional distribution, providing a richer understanding of the data than just the conditional mean in regression. This makes CDE particularly useful in critical application domains. However, interpretable CDE methods are understudied. Current methods typically employ kernel-based approaches, using kernel functions directly for kernel density estimation or as basis functions in linear models. In contrast, despite their conceptual simplicity and visualization suitability, tree-based methods -- which are arguably more comprehensible -- have been largely overlooked for CDE tasks. Thus, we propose the Conditional Density Tree (CDTree), a fully non-parametric model consisting of a decision tree in which each leaf is formed by a histogram model. Specifically, we formalize the problem of learning a CDTree using the minimum description length (MDL) principle, which eliminates the need for tuning the hyperparameter for regularization. Next, we propose an iterative algorithm that, although greedily, searches the optimal histogram for every possible node split. Our experiments demonstrate that, in comparison to existing interpretable CDE methods, CDTrees are both more accurate (as measured by the log-loss) and more robust against irrelevant features. Further, our approach leads to smaller tree sizes than existing tree-based models, which benefits interpretability.


Toward Universal and Interpretable World Models for Open-ended Learning Agents

arXiv.org Artificial Intelligence

We introduce a generic, compositional and interpretable class of generative world models that supports open-ended learning agents. This is a sparse class of Bayesian networks capable of approximating a broad range of stochastic processes, which provide agents with the ability to learn world models in a manner that may be both interpretable and computationally scalable. This approach integrating Bayesian structure learning and intrinsically motivated (model-based) planning enables agents to actively develop and refine their world models, which may lead to developmental learning and more robust, adaptive behavior.


Deep Optimal Sensor Placement for Black Box Stochastic Simulations

arXiv.org Machine Learning

Selecting cost-effective optimal sensor configurations for subsequent inference of parameters in black-box stochastic systems faces significant computational barriers. We propose a novel and robust approach, modelling the joint distribution over input parameters and solution with a joint energy-based model, trained on simulation data. Unlike existing simulation-based inference approaches, which must be tied to a specific set of point evaluations, we learn a functional representation of parameters and solution. This is used as a resolution-independent plug-and-play surrogate for the joint distribution, which can be conditioned over any set of points, permitting an efficient approach to sensor placement. We demonstrate the validity of our framework on a variety of stochastic problems, showing that our method provides highly informative sensor locations at a lower computational cost compared to conventional approaches.


Learning with Importance Weighted Variational Inference: Asymptotics for Gradient Estimators of the VR-IWAE Bound

arXiv.org Machine Learning

Several popular variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund (ELBO) in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE) and the Variational R\'enyi (VR) bounds. The methodology to learn the parameters of interest using these bounds typically amounts to running gradient-based variational inference algorithms that incorporate the reparameterization trick. However, the way the choice of the variational bound impacts the outcome of variational inference algorithms can be unclear. Recently, the VR-IWAE bound was introduced as a variational bound that unifies the ELBO, IWAE and VR bounds methodologies. In this paper, we provide two analyses for the reparameterized and doubly-reparameterized gradient estimators of the VR-IWAE bound, which reveal the advantages and limitations of these gradient estimators while enabling us to compare of the ELBO, IWAE and VR bounds methodologies. Our work advances the understanding of importance weighted variational inference methods and we illustrate our theoretical findings empirically.


Bayesian Experimental Design via Contrastive Diffusions

arXiv.org Machine Learning

Bayesian Optimal Experimental Design (BOED) is a powerful tool to reduce the cost of running a sequence of experiments. When based on the Expected Information Gain (EIG), design optimization corresponds to the maximization of some intractable expected {\it contrast} between prior and posterior distributions. Scaling this maximization to high dimensional and complex settings has been an issue due to BOED inherent computational complexity. In this work, we introduce an {\it expected posterior} distribution with cost-effective sampling properties and provide a tractable access to the EIG contrast maximization via a new EIG gradient expression. Diffusion-based samplers are used to compute the dynamics of the expected posterior and ideas from bi-level optimization are leveraged to derive an efficient joint sampling-optimization loop, without resorting to lower bound approximations of the EIG. The resulting efficiency gain allows to extend BOED to the well-tested generative capabilities of diffusion models. By incorporating generative models into the BOED framework, we expand its scope and its use in scenarios that were previously impractical. Numerical experiments and comparison with state-of-the-art methods show the potential of the approach.