Goto

Collaborating Authors

 Bayesian Learning


Preferential Normalizing Flows

arXiv.org Machine Learning

Eliciting a high-dimensional probability distribution from an expert via noisy judgments is notoriously challenging, yet useful for many applications, such as prior elicitation and reward modeling. We introduce a method for eliciting the expert's belief density as a normalizing flow based solely on preferential questions such as comparing or ranking alternatives. This allows eliciting in principle arbitrarily flexible densities, but flow estimation is susceptible to the challenge of collapsing or diverging probability mass that makes it difficult in practice. We tackle this problem by introducing a novel functional prior for the flow, motivated by a decision-theoretic argument, and show empirically that the belief density can be inferred as the function-space maximum a posteriori estimate. We demonstrate our method by eliciting multivariate belief densities of simulated experts, including the prior belief of a general-purpose large language model over a real-world dataset.


Conjugate Bayesian Two-step Change Point Detection for Hawkes Process

arXiv.org Machine Learning

The Bayesian two-step change point detection method is popular for the Hawkes process due to its simplicity and intuitiveness. However, the non-conjugacy between the point process likelihood and the prior requires most existing Bayesian two-step change point detection methods to rely on non-conjugate inference methods. These methods lack analytical expressions, leading to low computational efficiency and impeding timely change point detection. To address this issue, this work employs data augmentation to propose a conjugate Bayesian two-step change point detection method for the Hawkes process, which proves to be more accurate and efficient. Extensive experiments on both synthetic and real data demonstrate the superior effectiveness and efficiency of our method compared to baseline methods. Additionally, we conduct ablation studies to explore the robustness of our method concerning various hyperparameters.


Nonlinear Gaussian process tomography with imposed non-negativity constraints on physical quantities for plasma diagnostics

arXiv.org Artificial Intelligence

We propose a novel tomographic method, nonlinear Gaussian process tomography (nonlinear GPT) that employs the Laplace approximation to ensure the non-negative physical quantity, such as the emissivity of plasma optical diagnostics. This new method implements a logarithmic Gaussian process (log-GP) to model plasma distribution more naturally, thereby expanding the limitations of standard GPT, which are restricted to linear problems and may yield non-physical negative values. The effectiveness of the proposed log-GP tomography is demonstrated through a case study using the Ring Trap 1 (RT-1) device, where log-GPT outperforms existing methods, standard GPT, and the Minimum Fisher Information (MFI) methods in terms of reconstruction accuracy. The result highlights the effectiveness of nonlinear GPT for imposing physical constraints in applications to an inverse problem.


Conditional Density Estimation with Histogram Trees

arXiv.org Artificial Intelligence

Conditional density estimation (CDE) goes beyond regression by modeling the full conditional distribution, providing a richer understanding of the data than just the conditional mean in regression. This makes CDE particularly useful in critical application domains. However, interpretable CDE methods are understudied. Current methods typically employ kernel-based approaches, using kernel functions directly for kernel density estimation or as basis functions in linear models. In contrast, despite their conceptual simplicity and visualization suitability, tree-based methods -- which are arguably more comprehensible -- have been largely overlooked for CDE tasks. Thus, we propose the Conditional Density Tree (CDTree), a fully non-parametric model consisting of a decision tree in which each leaf is formed by a histogram model. Specifically, we formalize the problem of learning a CDTree using the minimum description length (MDL) principle, which eliminates the need for tuning the hyperparameter for regularization. Next, we propose an iterative algorithm that, although greedily, searches the optimal histogram for every possible node split. Our experiments demonstrate that, in comparison to existing interpretable CDE methods, CDTrees are both more accurate (as measured by the log-loss) and more robust against irrelevant features. Further, our approach leads to smaller tree sizes than existing tree-based models, which benefits interpretability.


Learning to rumble: Automated elephant call classification, detection and endpointing using deep architectures

arXiv.org Artificial Intelligence

We consider the problem of detecting, isolating and classifying elephant calls in continuously recorded audio. Such automatic call characterisation can assist conservation efforts and inform environmental management strategies. In contrast to previous work in which call detection was performed at a segment level, we perform call detection at a frame level which implicitly also allows call endpointing, the isolation of a call in a longer recording. For experimentation, we employ two annotated datasets, one containing Asian and the other African elephant vocalisations. We evaluate several shallow and deep classifier models, and show that the current best performance can be improved by using an audio spectrogram transformer (AST), a neural architecture which has not been used for this purpose before, and which we have configured in a novel sequence-to-sequence manner. We also show that using transfer learning by pre-training leads to further improvements both in terms of computational complexity and performance. Finally, we consider sub-call classification using an accepted taxonomy of call types, a task which has not previously been considered. We show that also in this case the transformer architectures provide the best performance. Our best classifiers achieve an average precision (AP) of 0.962 for framewise binary call classification, and an area under the receiver operating characteristic (AUC) of 0.957 and 0.979 for call classification with 5 classes and sub-call classification with 7 classes respectively. All of these represent either new benchmarks (sub-call classifications) or improvements on previously best systems. We conclude that a fully-automated elephant call detection and subcall classification system is within reach. Such a system would provide valuable information on the behaviour and state of elephant herds for the purposes of conservation and management.


Toward Universal and Interpretable World Models for Open-ended Learning Agents

arXiv.org Artificial Intelligence

We introduce a generic, compositional and interpretable class of generative world models that supports open-ended learning agents. This is a sparse class of Bayesian networks capable of approximating a broad range of stochastic processes, which provide agents with the ability to learn world models in a manner that may be both interpretable and computationally scalable. This approach integrating Bayesian structure learning and intrinsically motivated (model-based) planning enables agents to actively develop and refine their world models, which may lead to developmental learning and more robust, adaptive behavior.


Deep Optimal Sensor Placement for Black Box Stochastic Simulations

arXiv.org Machine Learning

Selecting cost-effective optimal sensor configurations for subsequent inference of parameters in black-box stochastic systems faces significant computational barriers. We propose a novel and robust approach, modelling the joint distribution over input parameters and solution with a joint energy-based model, trained on simulation data. Unlike existing simulation-based inference approaches, which must be tied to a specific set of point evaluations, we learn a functional representation of parameters and solution. This is used as a resolution-independent plug-and-play surrogate for the joint distribution, which can be conditioned over any set of points, permitting an efficient approach to sensor placement. We demonstrate the validity of our framework on a variety of stochastic problems, showing that our method provides highly informative sensor locations at a lower computational cost compared to conventional approaches.


Learning with Importance Weighted Variational Inference: Asymptotics for Gradient Estimators of the VR-IWAE Bound

arXiv.org Machine Learning

Several popular variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund (ELBO) in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE) and the Variational R\'enyi (VR) bounds. The methodology to learn the parameters of interest using these bounds typically amounts to running gradient-based variational inference algorithms that incorporate the reparameterization trick. However, the way the choice of the variational bound impacts the outcome of variational inference algorithms can be unclear. Recently, the VR-IWAE bound was introduced as a variational bound that unifies the ELBO, IWAE and VR bounds methodologies. In this paper, we provide two analyses for the reparameterized and doubly-reparameterized gradient estimators of the VR-IWAE bound, which reveal the advantages and limitations of these gradient estimators while enabling us to compare of the ELBO, IWAE and VR bounds methodologies. Our work advances the understanding of importance weighted variational inference methods and we illustrate our theoretical findings empirically.


Bayesian Experimental Design via Contrastive Diffusions

arXiv.org Machine Learning

Bayesian Optimal Experimental Design (BOED) is a powerful tool to reduce the cost of running a sequence of experiments. When based on the Expected Information Gain (EIG), design optimization corresponds to the maximization of some intractable expected {\it contrast} between prior and posterior distributions. Scaling this maximization to high dimensional and complex settings has been an issue due to BOED inherent computational complexity. In this work, we introduce an {\it expected posterior} distribution with cost-effective sampling properties and provide a tractable access to the EIG contrast maximization via a new EIG gradient expression. Diffusion-based samplers are used to compute the dynamics of the expected posterior and ideas from bi-level optimization are leveraged to derive an efficient joint sampling-optimization loop, without resorting to lower bound approximations of the EIG. The resulting efficiency gain allows to extend BOED to the well-tested generative capabilities of diffusion models. By incorporating generative models into the BOED framework, we expand its scope and its use in scenarios that were previously impractical. Numerical experiments and comparison with state-of-the-art methods show the potential of the approach.


Differentiable Programming for Computational Plasma Physics

arXiv.org Artificial Intelligence

Differentiable programming allows for derivatives of functions implemented via computer code to be calculated automatically. These derivatives are calculated using automatic differentiation (AD). This thesis explores two applications of differentiable programming to computational plasma physics. First, we consider how differentiable programming can be used to simplify and improve stellarator optimization. We introduce a stellarator coil design code (FOCUSADD) that uses gradient-based optimization to produce stellarator coils with finite build. Because we use reverse mode AD, which can compute gradients of scalar functions with the same computational complexity as the function, FOCUSADD is simple, flexible, and efficient. We then discuss two additional applications of AD in stellarator optimization. Second, we explore how machine learning (ML) can be used to improve or replace the numerical methods used to solve partial differential equations (PDEs), focusing on time-dependent PDEs in fluid mechanics relevant to plasma physics. Differentiable programming allows neural networks and other techniques from ML to be embedded within numerical methods. This is a promising, but relatively new, research area. We focus on two basic questions. First, can we design ML-based PDE solvers that have the same guarantees of conservation, stability, and positivity that standard numerical methods do? The answer is yes; we introduce error-correcting algorithms that preserve invariants of time-dependent PDEs. Second, which types of ML-based solvers work best at solving PDEs? We perform a systematic review of the scientific literature on solving PDEs with ML. Unfortunately we discover two issues, weak baselines and reporting biases, that affect the interpretation reproducibility of a significant majority of published research. We conclude that using ML to solve PDEs is not as promising as we initially believed.