Gerwinn, Sebastian
Estimation of Counterfactual Interventions under Uncertainties
Weilbach, Juliane, Gerwinn, Sebastian, Kandemir, Melih, Fraenzle, Martin
Counterfactual analysis is intuitively performed by humans on a daily basis eg. "What should I have done differently to get the loan approved?". Such counterfactual questions also steer the formulation of scientific hypotheses. More formally it provides insights about potential improvements of a system by inferring the effects of hypothetical interventions into a past observation of the system's behaviour which plays a prominent role in a variety of industrial applications. Due to the hypothetical nature of such analysis, counterfactual distributions are inherently ambiguous. This ambiguity is particularly challenging in continuous settings in which a continuum of explanations exist for the same observation. In this paper, we address this problem by following a hierarchical Bayesian approach which explicitly models such uncertainty. In particular, we derive counterfactual distributions for a Bayesian Warped Gaussian Process thereby allowing for non-Gaussian distributions and non-additive noise. We illustrate the properties our approach on a synthetic and on a semi-synthetic example and show its performance when used within an algorithmic recourse downstream task.
Learning Partially Known Stochastic Dynamics with Empirical PAC Bayes
Haussmann, Manuel, Gerwinn, Sebastian, Look, Andreas, Rakitsch, Barbara, Kandemir, Melih
In the following, we assume to have access to a differential equation system Neural Stochastic Differential Equations that describes the dynamics of the target environment model a dynamical environment with neural with low fidelity, e.g. by describing the vector field on nets assigned to their drift and diffusion a reduced dimensionality, by ignoring detailed models terms. The high expressive power of their of some system components, or by avoiding certain nonlinearity comes at the expense of instability dependencies for computational feasibility. We incorporate in the identification of the large set of free the ODE system provided by the domain expert parameters. This paper presents a recipe to into a nonlinear system identification engine, which we improve the prediction accuracy of such models choose to be a Bayesian Neural Stochastic Differential in three steps: i) accounting for epistemic Equation (BNSDE) to cover a large scope of dynamical uncertainty by assuming probabilistic weights, systems, resulting in a hybrid model.
Bayesian Prior Networks with PAC Training
Haussmann, Manuel, Gerwinn, Sebastian, Kandemir, Melih
We propose to train Bayesian Neural Networks (BNNs) by empirical Bayes as an alternative to posterior weight inference. By approximately marginalizing out an i.i.d.\ realization of a finite number of sibling weights per data-point using the Central Limit Theorem (CLT), we attain a scalable and effective Bayesian deep predictor. This approach directly models the posterior predictive distribution, by-passing the intractable posterior weight inference step. However, it introduces a prohibitively large number of hyperparameters for stable training. As the prior weights are marginalized and hyperparameters are optimized, the model also no longer provides a means to incorporate prior knowledge. We overcome both of these drawbacks by deriving a trivial PAC bound that comprises the marginal likelihood of the predictor and a complexity penalty. The outcome integrates organically into the prior networks framework, bringing about an effective and holistic Bayesian treatment of prediction uncertainty. We observe on various regression, classification, and out-of-domain detection benchmarks that our scalable method provides an improved model fit accompanied with significantly better uncertainty estimates than the state-of-the-art.
Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds
Reeb, David, Doerr, Andreas, Gerwinn, Sebastian, Rakitsch, Barbara
Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.
Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds
Reeb, David, Doerr, Andreas, Gerwinn, Sebastian, Rakitsch, Barbara
Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.
Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds
Reeb, David, Doerr, Andreas, Gerwinn, Sebastian, Rakitsch, Barbara
Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.
Perspectives on the Validation and Verification of Machine Learning Systems in the Context of Highly Automated Vehicles
Damm, Werner (Carl von Ossietzky Universität Oldenburg) | Fränzle, Martin (Carl von Ossietzky Universität Oldenburg) | Gerwinn, Sebastian (OFFIS e. V.) | Kröger, Paul (Carl von Ossietzky Universität Oldenburg)
Algorithms incorporating learned functionality play an increasingly important role for highly automated vehicles. Their impressive performance within environmental perception and other tasks central to automated driving comes at the price of a hitherto unsolved functional verification problem within safety analysis. We propose to combine statistical guarantee statements about the generalisation ability of learning algorithms with the functional architecture as well as constraints about the dynamics and ontology of the physical world, yielding an integrated formulation of the safety verification problem of functional architectures comprising artificial intelligence components. Its formulation as a probabilistic constraint system enables calculation of low risk manoeuvres. We illustrate the proposed scheme on a simple automotive scenario featuring unreliable environmental perception.
In All Likelihood, Deep Belief Is Not Enough
Theis, Lucas, Gerwinn, Sebastian, Sinz, Fabian, Bethge, Matthias
Statistical models of natural stimuli provide an important tool for researchers in the fields of machine learning and computational neuroscience. A canonical way to quantitatively assess and compare the performance of statistical models is given by the likelihood. One class of statistical models which has recently gained increasing popularity and has been applied to a variety of complex data are deep belief networks. Analyses of these models, however, have been typically limited to qualitative analyses based on samples due to the computationally intractable nature of the model likelihood. Motivated by these circumstances, the present article provides a consistent estimator for the likelihood that is both computationally tractable and simple to apply in practice. Using this estimator, a deep belief network which has been suggested for the modeling of natural image patches is quantitatively investigated and compared to other models of natural image patches. Contrary to earlier claims based on qualitative results, the results presented in this article provide evidence that the model under investigation is not a particularly good model for natural images
A joint maximum-entropy model for binary neural population patterns and continuous signals
Gerwinn, Sebastian, Berens, Philipp, Bethge, Matthias
Second-order maximum-entropy models have recently gained much interest for describing the statistics of binary spike trains. Here, we extend this approach to take continuous stimuli into account as well. By constraining the joint second-order statistics, we obtain a joint Gaussian-Boltzmann distribution of continuous stimuli and binary neural firing patterns, for which we also compute marginal and conditional distributions. This model has the same computational complexity as pure binary models and fitting it to data is a convex problem. We show that the model can be seen as an extension to the classical spike-triggered average/covariance analysis and can be used as a non-linear method for extracting features which a neural population is sensitive to. Further, by calculating the posterior distribution of stimuli given an observed neural response, the model can be used to decode stimuli and yields a natural spike-train metric. Therefore, extending the framework of maximum-entropy models to continuous variables allows us to gain novel insights into the relationship between the firing patterns of neural ensembles and the stimuli they are processing.
Bayesian estimation of orientation preference maps
Gerwinn, Sebastian, White, Leonard, Kaschube, Matthias, Bethge, Matthias, Macke, Jakob H.
Imaging techniques such as optical imaging of intrinsic signals, 2-photon calcium imaging and voltage sensitive dye imaging can be used to measure the functional organization of visual cortex across different spatial scales. Here, we present Bayesian methods based on Gaussian processes for extracting topographic maps from functional imaging data. In particular, we focus on the estimation of orientation preference maps (OPMs) from intrinsic signal imaging data. We model the underlying map as a bivariate Gaussian process, with a prior covariance function that reflects known properties of OPMs, and a noise covariance adjusted to the data. The posterior mean can be interpreted as an optimally smoothed estimate of the map, and can be used for model based interpolations of the map from sparse measurements. By sampling from the posterior distribution, we can get error bars on statistical properties such as preferred orientations, pinwheel locations or -counts. Finally, the use of an explicit probabilistic model facilitates interpretation of parameters and provides the basis for decoding studies. We demonstrate our model both on simulated data and on intrinsic signaling data from ferret visual cortex.