Trapp, Martin
Post-hoc Probabilistic Vision-Language Models
Baumann, Anton, Li, Rui, Klasson, Marcus, Mentu, Santeri, Karthik, Shyamgopal, Akata, Zeynep, Solin, Arno, Trapp, Martin
Vision-language models (VLMs), such as CLIP and SigLIP, have found remarkable success in classification, retrieval, and generative tasks. For this, VLMs deterministically map images and text descriptions to a joint latent space in which their similarity is assessed using the cosine similarity. However, a deterministic mapping of inputs fails to capture uncertainties over concepts arising from domain shifts when used in downstream tasks. In this work, we propose post-hoc uncertainty estimation in VLMs that does not require additional training. Our method leverages a Bayesian posterior approximation over the last layers in VLMs and analytically quantifies uncertainties over cosine similarities. We demonstrate its effectiveness for uncertainty quantification and support set selection in active learning. Compared to baselines, we obtain improved and well-calibrated predictive uncertainties, interpretable uncertainty estimates, and sample-efficient active learning. Our results show promise for safety-critical applications of large-scale models.
Streamlining Prediction in Bayesian Deep Learning
Li, Rui, Klasson, Marcus, Solin, Arno, Trapp, Martin
The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for estimating the posterior distribution. However, efficient computation of inferences, such as predictions, has been largely overlooked with Monte Carlo integration remaining the standard. In this work we examine streamlining prediction in BDL through a single forward pass without sampling. For this we use local linearisation on activation functions and local Gaussian approximations at linear layers. Thus allowing us to analytically compute an approximation to the posterior predictive distribution. We showcase our approach for both MLP and transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks. Recent progress and adoption of deep learning models, has led to a sharp increase of interest in improving their reliability and robustness. In applications such as aided medical diagnosis (Begoli et al., 2019), autonomous driving (Michelmore et al., 2020), or supporting scientific discovery (Psaros et al., 2023); providing reliable and robust predictions as well as identifying failure modes is vital. A principled approach to address these challenges is the use of Bayesian deep learning (BDL, Wilson & Izmailov, 2020; Papamarkou et al., 2024) which promises a plug & play framework for uncertainty quantification. The key challenges associated with BDL, can roughly be divided into three parts: (i) defining a meaningful prior, (ii) estimating the posterior distribution, and (iii) performing inferences of interest, e.g., making predictions for unseen data, detecting out-of-distribution settings, or analysing model sensitivities. While constructing a meaningful prior is an important research direction (Nalisnick, 2018; Meronen et al., 2021; Fortuin et al., 2021; Tran et al., 2022), it has been argued that the differentiating aspect of Bayesian deep learning is marginalisation (Wilson & Izmailov, 2020; Wilson, 2020) rather than the prior itself. Figure 1: Our streamlined approach allows for practical outlier detection and sensitivity analysis. Locally linearizing the network function with local Gaussian approximations enables many relevant inference tasks to be solved analytically, helping render BDL a practical tool for downstream tasks.
On Hardware-efficient Inference in Probabilistic Circuits
Yao, Lingyun, Trapp, Martin, Leslin, Jelin, Singh, Gaurav, Zhang, Peng, Periasamy, Karthekeyan, Andraud, Martin
Probabilistic circuits (PCs) offer a promising avenue to perform embedded reasoning under uncertainty. They support efficient and exact computation of various probabilistic inference tasks by design. Hence, hardware-efficient computation of PCs is highly interesting for edge computing applications. As computations in PCs are based on arithmetic with probability values, they are typically performed in the log domain to avoid underflow. Unfortunately, performing the log operation on hardware is costly. Hence, prior work has focused on computations in the linear domain, resulting in high resolution and energy requirements. This work proposes the first dedicated approximate computing framework for PCs that allows for low-resolution logarithm computations. We leverage Addition As Int, resulting in linear PC computation with simple hardware elements. Further, we provide a theoretical approximation error analysis and present an error compensation mechanism. Empirically, our method obtains up to 357x and 649x energy reduction on custom hardware for evidence and MAP queries respectively with little or no computational error.
Flatness Improves Backbone Generalisation in Few-shot Classification
Li, Rui, Trapp, Martin, Klasson, Marcus, Solin, Arno
Deployment of deep neural networks in real-world settings typically requires adaptation to new tasks with few examples. Few-shot classification (FSC) provides a solution to this problem by leveraging pre-trained backbones for fast adaptation to new classes. Surprisingly, most efforts have only focused on developing architectures for easing the adaptation to the target domain without considering the importance of backbone training for good generalisation. We show that flatness-aware backbone training with vanilla fine-tuning results in a simpler yet competitive baseline compared to the state-of-the-art. Our results indicate that for in- and cross-domain FSC, backbone training is crucial to achieving good generalisation across different adaptation methods. We advocate more care should be taken when training these models.
Characteristic Circuits
Yu, Zhongjie, Trapp, Martin, Kersting, Kristian
In many real-world scenarios, it is crucial to be able to reliably and efficiently reason under uncertainty while capturing complex relationships in data. Probabilistic circuits (PCs), a prominent family of tractable probabilistic models, offer a remedy to this challenge by composing simple, tractable distributions into a high-dimensional probability distribution. However, learning PCs on heterogeneous data is challenging and densities of some parametric distributions are not available in closed form, limiting their potential use. We introduce characteristic circuits (CCs), a family of tractable probabilistic models providing a unified formalization of distributions over heterogeneous data in the spectral domain. The one-to-one relationship between characteristic functions and probability measures enables us to learn high-dimensional distributions on heterogeneous data domains and facilitates efficient probabilistic inference even when no closed-form density function is available. We show that the structure and parameters of CCs can be learned efficiently from the data and find that CCs outperform state-of-the-art density estimators for heterogeneous data domains on common benchmark data sets.
Fixing Overconfidence in Dynamic Neural Networks
Meronen, Lassi, Trapp, Martin, Pilzer, Andrea, Yang, Le, Solin, Arno
Dynamic neural networks are a recent technique that promises a remedy for the increasing size of modern deep learning models by dynamically adapting their computational cost to the difficulty of the inputs. In this way, the model can adjust to a limited computational budget. However, the poor quality of uncertainty estimates in deep learning models makes it difficult to distinguish between hard and easy samples. To address this challenge, we present a computationally efficient approach for post-hoc uncertainty quantification in dynamic neural networks. We show that adequately quantifying and accounting for both aleatoric and epistemic uncertainty through a probabilistic treatment of the last layers improves the predictive performance and aids decision-making when determining the computational budget. In the experiments, we show improvements on CIFAR-100, ImageNet, and Caltech-256 in terms of accuracy, capturing uncertainty, and calibration error.
Transport with Support: Data-Conditional Diffusion Bridges
Tamir, Ella, Trapp, Martin, Solin, Arno
The dynamic Schr\"odinger bridge problem provides an appealing setting for solving constrained time-series data generation tasks posed as optimal transport problems. It consists of learning non-linear diffusion processes using efficient iterative solvers. Recent works have demonstrated state-of-the-art results (eg. in modelling single-cell embryo RNA sequences or sampling from complex posteriors) but are limited to learning bridges with only initial and terminal constraints. Our work extends this paradigm by proposing the Iterative Smoothing Bridge (ISB). We integrate Bayesian filtering and optimal control into learning the diffusion process, enabling the generation of constrained stochastic processes governed by sparse observations at intermediate stages and terminal constraints. We assess the effectiveness of our method on synthetic and real-world data generation tasks and we show that the ISB generalises well to high-dimensional data, is computationally efficient, and provides accurate estimates of the marginals at intermediate and terminal times.
Subtractive Mixture Models via Squaring: Representation and Learning
Loconte, Lorenzo, Sladek, Aleksanteri M., Mengel, Stefan, Trapp, Martin, Solin, Arno, Gillis, Nicolas, Vergari, Antonio
Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.
The Robust Semantic Segmentation UNCV2023 Challenge Results
Yu, Xuanlong, Zuo, Yi, Wang, Zitao, Zhang, Xiaowen, Zhao, Jiaxuan, Yang, Yuting, Jiao, Licheng, Peng, Rui, Wang, Xinyi, Zhang, Junpei, Zhang, Kexin, Liu, Fang, Alcover-Couso, Roberto, SanMiguel, Juan C., Escudero-Viรฑolo, Marcos, Tian, Hanlin, Matsui, Kenta, Wang, Tianhao, Adan, Fahmy, Gao, Zhitong, He, Xuming, Bouniot, Quentin, Moghaddam, Hossein, Rai, Shyam Nandan, Cermelli, Fabio, Masone, Carlo, Pilzer, Andrea, Ricci, Elisa, Bursuc, Andrei, Solin, Arno, Trapp, Martin, Li, Rui, Yao, Angela, Chen, Wenlong, Simpson, Ivor, Campbell, Neill D. F., Franchi, Gianni
This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty quantification methodologies presented at prominent conferences in the fields of computer vision and machine learning and journals over the past few years. Within this document, the challenge is introduced, shedding light on its purpose and objectives, which primarily revolved around enhancing the robustness of semantic segmentation in urban scenes under varying natural adversarial conditions. The report then delves into the top-performing solutions. Moreover, the document aims to provide a comprehensive overview of the diverse solutions deployed by all participants. By doing so, it seeks to offer readers a deeper insight into the array of strategies that can be leveraged to effectively handle the inherent uncertainties associated with autonomous driving and semantic segmentation, especially within urban environments.
Disentangling Model Multiplicity in Deep Learning
Heljakka, Ari, Trapp, Martin, Kannala, Juho, Solin, Arno
Model multiplicity is a well-known but poorly understood phenomenon that undermines the generalisation guarantees of machine learning models. It appears when two models with similar training-time performance differ in their predictions and real-world performance characteristics. This observed 'predictive' multiplicity (PM) also implies elusive differences in the internals of the models, their 'representational' multiplicity (RM). We introduce a conceptual and experimental setup for analysing RM by measuring activation similarity via singular vector canonical correlation analysis (SVCCA). We show that certain differences in training methods systematically result in larger RM than others and evaluate RM and PM over a finite sample as predictors for generalizability. We further correlate RM with PM measured by the variance in i.i.d. and out-of-distribution test predictions in four standard image data sets. Finally, instead of attempting to eliminate RM, we call for its systematic measurement and maximal exposure.