Goto

Collaborating Authors

 Bayesian Learning


Energy-Based Prior Latent Space Diffusion model for Reconstruction of Lumbar Vertebrae from Thick Slice MRI

arXiv.org Artificial Intelligence

Lumbar spine problems are ubiquitous, motivating research into targeted imaging for treatment planning and guided interventions. While high resolution and high contrast CT has been the modality of choice, MRI can capture both bone and soft tissue without the ionizing radiation of CT albeit longer acquisition time. The critical trade-off between contrast quality and acquisition time has motivated 'thick slice MRI', which prioritises faster imaging with high in-plane resolution but variable contrast and low through-plane resolution. We investigate a recently developed post-acquisition pipeline which segments vertebrae from thick-slice acquisitions and uses a variational autoencoder to enhance quality after an initial 3D reconstruction. We instead propose a latent space diffusion energy-based prior to leverage diffusion models, which exhibit high-quality image generation. Crucially, we mitigate their high computational cost and low sample efficiency by learning an energy-based latent representation to perform the diffusion processes. Our resulting method outperforms existing approaches across metrics including Dice and VS scores, and more faithfully captures 3D features.


Optimal Particle-based Approximation of Discrete Distributions (OPAD)

arXiv.org Machine Learning

Particle-based methods include a variety of techniques, such as Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC), for approximating a probabilistic target distribution with a set of weighted particles. In this paper, we prove that for any set of particles, there is a unique weighting mechanism that minimizes the Kullback-Leibler (KL) divergence of the (particle-based) approximation from the target distribution, when that distribution is discrete -- any other weighting mechanism (e.g. MCMC weighting that is based on particles' repetitions in the Markov chain) is sub-optimal with respect to this divergence measure. Our proof does not require any restrictions either on the target distribution, or the process by which the particles are generated, other than the discreteness of the target. We show that the optimal weights can be determined based on values that any existing particle-based method already computes; As such, with minimal modifications and no extra computational costs, the performance of any particle-based method can be improved. Our empirical evaluations are carried out on important applications of discrete distributions including Bayesian Variable Selection and Bayesian Structure Learning. The results illustrate that our proposed reweighting of the particles improves any particle-based approximation to the target distribution consistently and often substantially.


Interval Estimation of Coefficients in Penalized Regression Models of Insurance Data

arXiv.org Machine Learning

The Tweedie exponential dispersion family is a popular choice among many to model insurance losses that consist of zero-inflated semicontinuous data. In such data, it is often important to obtain credibility (inference) of the most important features that describe the endogenous variables. Post-selection inference is the standard procedure in statistics to obtain confidence intervals of model parameters after performing a feature extraction procedure. For a linear model, the lasso estimate often has non-negligible estimation bias for large coefficients corresponding to exogenous variables. To have valid inference on those coefficients, it is necessary to correct the bias of the lasso estimate. Traditional statistical methods, such as hypothesis testing or standard confidence interval construction might lead to incorrect conclusions during post-selection, as they are generally too optimistic. Here we discuss a few methodologies for constructing confidence intervals of the coefficients after feature selection in the Generalized Linear Model (GLM) family with application to insurance data.


Analysis of High-dimensional Gaussian Labeled-unlabeled Mixture Model via Message-passing Algorithm

arXiv.org Machine Learning

Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data. Although SSL has been applied in various applications and its effectiveness has been empirically demonstrated, it is still not fully understood when and why SSL performs well. Some existing theoretical studies have attempted to address this issue by modeling classification problems using the so-called Gaussian Mixture Model (GMM). These studies provide notable and insightful interpretations. However, their analyses are focused on specific purposes, and a thorough investigation of the properties of GMM in the context of SSL has been lacking. In this paper, we conduct such a detailed analysis of the properties of the high-dimensional GMM for binary classification in the SSL setting. To this end, we employ the approximate message passing and state evolution methods, which are widely used in high-dimensional settings and originate from statistical mechanics. We deal with two estimation approaches: the Bayesian one and the l2-regularized maximum likelihood estimation (RMLE). We conduct a comprehensive comparison between these two approaches, examining aspects such as the global phase diagram, estimation error for the parameters, and prediction error for the labels. A specific comparison is made between the Bayes-optimal (BO) estimator and RMLE, as the BO setting provides optimal estimation performance and is ideal as a benchmark. Our analysis shows that with appropriate regularizations, RMLE can achieve near-optimal performance in terms of both the estimation error and prediction error, especially when there is a large amount of unlabeled data. These results demonstrate that the l2 regularization term plays an effective role in estimation and prediction in SSL approaches.


Machine learning the Ising transition: A comparison between discriminative and generative approaches

arXiv.org Artificial Intelligence

The detection of phase transitions is a central task in many-body physics. To automate this process, the task can be phrased as a classification problem. Classification problems can be approached in two fundamentally distinct ways: through either a discriminative or a generative method. In general, it is unclear which of these two approaches is most suitable for a given problem. The choice is expected to depend on factors such as the availability of system knowledge, dataset size, desired accuracy, computational resources, and other considerations. In this work, we answer the question of how one should approach the solution of phase-classification problems by performing a numerical case study on the thermal phase transition in the classical two-dimensional square-lattice ferromagnetic Ising model.


Comprehensive Survey of Reinforcement Learning: From Algorithms to Practical Challenges

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has emerged as a powerful paradigm in Artificial Intelligence (AI), enabling agents to learn optimal behaviors through interactions with their environments. Drawing from the foundations of trial and error, RL equips agents to make informed decisions through feedback in the form of rewards or penalties. This paper presents a comprehensive survey of RL, meticulously analyzing a wide range of algorithms, from foundational tabular methods to advanced Deep Reinforcement Learning (DRL) techniques. We categorize and evaluate these algorithms based on key criteria such as scalability, sample efficiency, and suitability. We compare the methods in the form of their strengths and weaknesses in diverse settings. Additionally, we offer practical insights into the selection and implementation of RL algorithms, addressing common challenges like convergence, stability, and the exploration-exploitation dilemma. This paper serves as a comprehensive reference for researchers and practitioners aiming to harness the full potential of RL in solving complex, real-world problems.


Investigating Plausibility of Biologically Inspired Bayesian Learning in ANNs

arXiv.org Artificial Intelligence

Catastrophic forgetting has been the leading issue in the domain of lifelong learning in artificial systems. Current artificial systems are reasonably good at learning domains they have seen before; however, as soon as they encounter something new, they either go through a significant performance deterioration or if you try to teach them the new distribution of data, they forget what they have learned before. Additionally, they are also prone to being overly confident when performing inference on seen as well as unseen data, causing significant reliability issues when lives are at stake. Therefore, it is extremely important to dig into this problem and formulate an approach that will be continually adaptable as well as reliable. If we move away from the engineering domain of such systems and look into biological systems, we can realize that these very systems are very efficient at computing the reliance as well as the uncertainty of accurate predictions that further help them refine the inference in a life-long setting. These systems are not perfect; however, they do give us a solid understanding of the reasoning under uncertainty which takes us to the domain of Bayesian reasoning. We incorporate this Bayesian inference with thresholding mechanism as to mimic more biologically inspired models, but only at spatial level. Further, we reproduce a recent study on Bayesian Inference with Spiking Neural Networks for Continual Learning to compare against it as a suitable biologically inspired Bayesian framework. Overall, we investigate the plausibility of biologically inspired Bayesian Learning in artificial systems on a vision dataset, MNIST, and show relative performance improvement under the conditions when the model is forced to predict VS when the model is not.


NeuroAI for AI Safety

arXiv.org Artificial Intelligence

As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety.


Streamlining Prediction in Bayesian Deep Learning

arXiv.org Artificial Intelligence

The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for estimating the posterior distribution. However, efficient computation of inferences, such as predictions, has been largely overlooked with Monte Carlo integration remaining the standard. In this work we examine streamlining prediction in BDL through a single forward pass without sampling. For this we use local linearisation on activation functions and local Gaussian approximations at linear layers. Thus allowing us to analytically compute an approximation to the posterior predictive distribution. We showcase our approach for both MLP and transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks. Recent progress and adoption of deep learning models, has led to a sharp increase of interest in improving their reliability and robustness. In applications such as aided medical diagnosis (Begoli et al., 2019), autonomous driving (Michelmore et al., 2020), or supporting scientific discovery (Psaros et al., 2023); providing reliable and robust predictions as well as identifying failure modes is vital. A principled approach to address these challenges is the use of Bayesian deep learning (BDL, Wilson & Izmailov, 2020; Papamarkou et al., 2024) which promises a plug & play framework for uncertainty quantification. The key challenges associated with BDL, can roughly be divided into three parts: (i) defining a meaningful prior, (ii) estimating the posterior distribution, and (iii) performing inferences of interest, e.g., making predictions for unseen data, detecting out-of-distribution settings, or analysing model sensitivities. While constructing a meaningful prior is an important research direction (Nalisnick, 2018; Meronen et al., 2021; Fortuin et al., 2021; Tran et al., 2022), it has been argued that the differentiating aspect of Bayesian deep learning is marginalisation (Wilson & Izmailov, 2020; Wilson, 2020) rather than the prior itself. Figure 1: Our streamlined approach allows for practical outlier detection and sensitivity analysis. Locally linearizing the network function with local Gaussian approximations enables many relevant inference tasks to be solved analytically, helping render BDL a practical tool for downstream tasks.


Probabilistic size-and-shape functional mixed models

arXiv.org Machine Learning

The reliable recovery and uncertainty quantification of a fixed effect function $\mu$ in a functional mixed model, for modelling population- and object-level variability in noisily observed functional data, is a notoriously challenging task: variations along the $x$ and $y$ axes are confounded with additive measurement error, and cannot in general be disentangled. The question then as to what properties of $\mu$ may be reliably recovered becomes important. We demonstrate that it is possible to recover the size-and-shape of a square-integrable $\mu$ under a Bayesian functional mixed model. The size-and-shape of $\mu$ is a geometric property invariant to a family of space-time unitary transformations, viewed as rotations of the Hilbert space, that jointly transform the $x$ and $y$ axes. A random object-level unitary transformation then captures size-and-shape \emph{preserving} deviations of $\mu$ from an individual function, while a random linear term and measurement error capture size-and-shape \emph{altering} deviations. The model is regularized by appropriate priors on the unitary transformations, posterior summaries of which may then be suitably interpreted as optimal data-driven rotations of a fixed orthonormal basis for the Hilbert space. Our numerical experiments demonstrate utility of the proposed model, and superiority over the current state-of-the-art.