Streamlining Prediction in Bayesian Deep Learning

Li, Rui, Klasson, Marcus, Solin, Arno, Trapp, Martin

arXiv.org Artificial Intelligence 

The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for estimating the posterior distribution. However, efficient computation of inferences, such as predictions, has been largely overlooked with Monte Carlo integration remaining the standard. In this work we examine streamlining prediction in BDL through a single forward pass without sampling. For this we use local linearisation on activation functions and local Gaussian approximations at linear layers. Thus allowing us to analytically compute an approximation to the posterior predictive distribution. We showcase our approach for both MLP and transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks. Recent progress and adoption of deep learning models, has led to a sharp increase of interest in improving their reliability and robustness. In applications such as aided medical diagnosis (Begoli et al., 2019), autonomous driving (Michelmore et al., 2020), or supporting scientific discovery (Psaros et al., 2023); providing reliable and robust predictions as well as identifying failure modes is vital. A principled approach to address these challenges is the use of Bayesian deep learning (BDL, Wilson & Izmailov, 2020; Papamarkou et al., 2024) which promises a plug & play framework for uncertainty quantification. The key challenges associated with BDL, can roughly be divided into three parts: (i) defining a meaningful prior, (ii) estimating the posterior distribution, and (iii) performing inferences of interest, e.g., making predictions for unseen data, detecting out-of-distribution settings, or analysing model sensitivities. While constructing a meaningful prior is an important research direction (Nalisnick, 2018; Meronen et al., 2021; Fortuin et al., 2021; Tran et al., 2022), it has been argued that the differentiating aspect of Bayesian deep learning is marginalisation (Wilson & Izmailov, 2020; Wilson, 2020) rather than the prior itself. Figure 1: Our streamlined approach allows for practical outlier detection and sensitivity analysis. Locally linearizing the network function with local Gaussian approximations enables many relevant inference tasks to be solved analytically, helping render BDL a practical tool for downstream tasks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found