Schmitt, Marvin, Bürkner, Paul-Christian, Köthe, Ullrich, Radev, Stefan T.

Neural density estimators have proven remarkably powerful in performing efficient simulation-based Bayesian inference in various research domains. In particular, the BayesFlow framework uses a two-step approach to enable amortized parameter estimation in settings where the likelihood function is implicitly defined by a simulation program. But how faithful is such inference when simulations are poor representations of reality? In this paper, we conceptualize the types of model misspecification arising in simulation-based inference and systematically investigate the performance of the BayesFlow framework under these misspecifications. We propose an augmented optimization objective which imposes a probabilistic structure on the latent data space and utilize maximum mean discrepancy (MMD) to detect potentially catastrophic misspecifications during inference undermining the validity of the obtained results. We verify our detection criterion on a number of artificial and realistic misspecifications, ranging from toy conjugate models to complex models of decision making and disease outbreak dynamics applied to real data. Further, we show that posterior inference errors increase as a function of the distance between the true data-generating distribution and the typical set of simulations in the latent summary space. Thus, we demonstrate the dual utility of MMD as a method for detecting model misspecification and as a proxy for verifying the faithfulness of amortized Bayesian inference.

Ardizzone, Lynton, Kruse, Jakob, Lüth, Carsten, Bracher, Niels, Rother, Carsten, Köthe, Ullrich

We introduce a new architecture called a conditional invertible neural network (cINN), and use it to address the task of diverse image-to-image translation for natural images. This is not easily possible with existing INN models due to some fundamental limitations. The cINN combines the purely generative INN model with an unconstrained feed-forward network, which efficiently preprocesses the conditioning image into maximally informative features. All parameters of a cINN are jointly optimized with a stable, maximum likelihood-based training procedure. Even though INN-based models have received far less attention in the literature than GANs, they have been shown to have some remarkable properties absent in GANs, e.g. apparent immunity to mode collapse. We find that our cINNs leverage these properties for image-to-image translation, demonstrated on day to night translation and image colorization. Furthermore, we take advantage of our bidirectional cINN architecture to explore and manipulate emergent properties of the latent space, such as changing the image style in an intuitive way.

Nölke, Jan-Hinrich, Adler, Tim, Gröhl, Janek, Ardizzone, Lynton, Rother, Carsten, Köthe, Ullrich, Maier-Hein, Lena

Multispectral photoacoustic imaging (PAI) is an emerging imaging modality which enables the recovery of functional tissue parameters such as blood oxygenation. However, the underlying inverse problems are potentially ill-posed, meaning that radically different tissue properties may - in theory - yield comparable measurements. In this work, we present a new approach for handling this specific type of uncertainty by leveraging the concept of conditional invertible neural networks (cINNs). Specifically, we propose going beyond commonly used point estimates for tissue oxygenation and converting single-pixel initial pressure spectra to the full posterior probability density. This way, the inherent ambiguity of a problem can be encoded with multiple modes in the output. Based on the presented architecture, we demonstrate two use cases which leverage this information to not only detect and quantify but also to compensate for uncertainties: (1) photoacoustic device design and (2) optimization of photoacoustic image acquisition. Our in silico studies demonstrate the potential of the proposed methodology to become an important building block for uncertainty-aware reconstruction of physiological parameters with PAI.

Müller, Jens, Schmier, Robert, Ardizzone, Lynton, Rother, Carsten, Köthe, Ullrich

Standard supervised learning breaks down under data distribution shift. However, the principle of independent causal mechanisms (ICM, Peters et al. (2017)) can turn this weakness into an opportunity: one can take advantage of distribution shift between different environments during training in order to obtain more robust models. We propose a new gradient-based learning framework whose objective function is derived from the ICM principle. We show theoretically and experimentally that neural networks trained in this framework focus on relations remaining invariant across environments and ignore unstable ones. Moreover, we prove that the recovered stable relations correspond to the true causal mechanisms under certain conditions. In both regression and classification, the resulting models generalize well to unseen scenarios where traditionally trained models fail.

Radev, Stefan T., D'Alessandro, Marco, Bürkner, Paul-Christian, Mertens, Ulf K., Voss, Andreas, Köthe, Ullrich

Comparing competing mathematical models of complex natural processes is a shared goal among many branches of science. The Bayesian probabilistic framework offers a principled way to perform model comparison and extract useful metrics for guiding decisions. However, many interesting models are intractable with standard Bayesian methods, as they lack a closed-form likelihood function or the likelihood is computationally too expensive to evaluate. With this work, we propose a novel method for performing Bayesian model comparison using specialized deep learning architectures. Our method is purely simulation-based and circumvents the step of explicitly fitting all alternative models under consideration to each observed dataset. Moreover, it involves no hand-crafted summary statistics of the data and is designed to amortize the cost of simulation over multiple models and observable datasets. This makes the method applicable in scenarios where model fit needs to be assessed for a large number of datasets, so that per-dataset inference is practically infeasible. Finally, we propose a novel way to measure epistemic uncertainty in model comparison problems. We argue that this measure of epistemic uncertainty provides a unique proxy to quantify absolute evidence even in a framework which assumes that the true data-generating model is within a finite set of candidate models.

Radev, Stefan T., Mertens, Ulf K., Voss, Andreass, Ardizzone, Lynton, Köthe, Ullrich

Estimating the parameters of mathematical models is a common problem in almost all branches of science. However, this problem can prove notably difficult when processes and model descriptions become increasingly complex and an explicit likelihood function is not available. With this work, we propose a novel method for globally amortized Bayesian inference based on invertible neural networks which we call BayesFlow. The method uses simulation to learn a global estimator for the probabilistic mapping from observed data to underlying model parameters. A neural network pre-trained in this way can then, without additional training or optimization, infer full posteriors on arbitrary many real data sets involving the same model family. In addition, our method incorporates a summary network trained to embed the observed data into maximally informative summary statistics. Learning summary statistics from data makes the method applicable to modeling scenarios where standard inference techniques with hand-crafted summary statistics fail. We demonstrate the utility of BayesFlow on challenging intractable models from population dynamics, epidemiology, cognitive science and ecology. We argue that BayesFlow provides a general framework for building reusable Bayesian parameter estimation machines for any process model from which data can be simulated.

Ardizzone, Lynton, Mackowiak, Radek, Köthe, Ullrich, Rother, Carsten

The Information Bottleneck (IB) principle offers a unified approach to many learning and prediction problems. Although optimal in an information-theoretic sense, practical applications of IB are hampered by a lack of accurate high-dimensional estimators of mutual information, its main constituent. We propose to combine IB with invertible neural networks (INNs), which for the first time allows exact calculation of the required mutual information. Applied to classification, our proposed method results in a generative classifier we call IB-INN. It accurately models the class conditional likelihoods, generalizes well to unseen data and reliably recognizes out-of-distribution examples. In contrast to existing generative classifiers, these advantages incur only minor reductions in classification accuracy in comparison to corresponding discriminative methods such as feed-forward networks. Furthermore, we provide insight into why IB-INNs are superior to other generative architectures and training procedures and show experimentally that our method outperforms alternative models of comparable complexity.

Sorrenson, Peter, Rother, Carsten, Köthe, Ullrich

A central question of representation learning asks under which conditions it is possible to reconstruct the true latent variables of an arbitrarily complex generative process. Recent breakthrough work by Khemakhem et al. (2019) on nonlinear ICA has answered this question for a broad class of conditional generative processes. We extend this important result in a direction relevant for application to real-world data. First, we generalize the theory to the case of unknown intrinsic problem dimension and prove that in some special (but not very restrictive) cases, informative latent variables will be automatically separated from noise by an estimating model. Furthermore, the recovered informative latent variables will be in one-to-one correspondence with the true latent variables of the generating process, up to a trivial component-wise transformation. Second, we introduce a modification of the RealNVP invertible neural network architecture (Dinh et al., 2016) which is particularly suitable for this type of problem: the General Incompressible-flow Network (GIN). Experiments on artificial data and EMNIST demonstrate that theoretical predictions are indeed verified in practice. In particular, we provide a detailed set of exactly 22 informative latent variables extracted from EMNIST. Deep latent-variable models promise to unlock the key factors of variation within a dataset, opening a window to interpretation and granting the power to manipulate data in an intuitive fashion. The theory of identifiability in linear independent component analysis (ICA) (Comon, 1994) tells us when this is possible, if we restrict the model to a linear transformation, but until recently there was no corresponding theory for the highly nonlinear models needed to manipulate complex data. This changed with the recent breakthrough work by Khemakhem et al. (2019), which showed that under relatively mild conditions, it is possible to recover the joint data and latent space distribution, up to a simple transformation in the latent space. The key requirement is that the generating process is conditioned on a variable which is observed along with the data. This condition could be a class label, time index of a time series, or any other piece of information additional to the data.

Adler, Tim J., Ayala, Leonardo, Ardizzone, Lynton, Kenngott, Hannes G., Vemuri, Anant, Müller-Stich, Beat P., Rother, Carsten, Köthe, Ullrich, Maier-Hein, Lena

Multispectral optical imaging is becoming a key tool in the operating room. Recent research has shown that machine learning algorithms can be used to convert pixel-wise reflectance measurements to tissue parameters, such as oxygenation. However, the accuracy of these algorithms can only be guaranteed if the spectra acquired during surgery match the ones seen during training. It is therefore of great interest to detect so-called out of distribution (OoD) spectra to prevent the algorithm from presenting spurious results. In this paper we present an information theory based approach to OoD detection based on the widely applicable information criterion (WAIC). Our work builds upon recent methodology related to invertible neural networks (INN). Specifically, we make use of an ensemble of INNs as we need their tractable Jacobians in order to compute the WAIC. Comprehensive experiments with in silico, and in vivo multispectral imaging data indicate that our approach is well-suited for OoD detection. Our method could thus be an important step towards reliable functional imaging in the operating room.

Detommaso, Gianluca, Kruse, Jakob, Ardizzone, Lynton, Rother, Carsten, Köthe, Ullrich, Scheichl, Robert

In this paper, we introduce Hierarchical Invertible Neural Transport (HINT), an algorithm that merges Invertible Neural Networks and optimal transport to sample from a posterior distribution in a Bayesian framework. This method exploits a hierarchical architecture to construct a Knothe-Rosenblatt transport map between an arbitrary density and the joint density of hidden variables and observations. After training the map, samples from the posterior can be immediately recovered for any contingent observation. Any underlying model evaluation can be performed fully offline from training without the need of a model-gradient. Furthermore, no analytical evaluation of the prior is necessary, which makes HINT an ideal candidate for sequential Bayesian inference. We demonstrate the efficacy of HINT on two numerical experiments.