Bayesian Inference
A Survey of Algorithms for Black-Box Safety Validation
Corso, Anthony, Moss, Robert J., Koren, Mark, Lee, Ritchie, Kochenderfer, Mykel J.
Autonomous and semi-autonomous systems for safety-critical applications require rigorous testing before deployment. Due to the complexity of these systems, formal verification may be impossible and real-world testing may be dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a black box during testing. Safety validation tasks include finding disturbances to the system that cause it to fail (falsification), finding the most-likely failure, and estimating the probability that the system fails. Motivated by the prevalence of safety-critical artificial intelligence, this work provides a survey of state-of-the-art safety validation techniques with a focus on applied algorithms and their modifications for the safety validation problem. We present and discuss algorithms in the domains of optimization, path planning, reinforcement learning, and importance sampling. Problem decomposition techniques are presented to help scale algorithms to large state spaces, and a brief overview of safety-critical applications is given, including autonomous vehicles and aircraft collision avoidance systems. Finally, we present a survey of existing academic and commercially available safety validation tools.
Robust model training and generalisation with Studentising flows
Alexanderson, Simon, Henter, Gustav Eje
Normalising flows are tractable probabilistic models that leverage the power of deep learning to describe a wide parametric family of distributions, all while remaining trainable using maximum likelihood. We discuss how these methods can be further improved based on insights from robust (in particular, resistant) statistics. Specifically, we propose to endow flow-based models with fat-tailed latent distributions such as multivariate Student's $t$, as a simple drop-in replacement for the Gaussian distribution used by conventional normalising flows. While robustness brings many advantages, this paper explores two of them: 1) We describe how using fatter-tailed base distributions can give benefits similar to gradient clipping, but without compromising the asymptotic consistency of the method. 2) We also discuss how robust ideas lead to models with reduced generalisation gap and improved held-out data likelihood. Experiments on several different datasets confirm the efficacy of the proposed approach in both regards.
Generalized Maximum Entropy for Supervised Classification
Mazuelas, Santiago, Shen, Yuan, Pérez, Aritz
The maximum entropy principle advocates to evaluate events' probabilities using a distribution that maximizes entropy among those that satisfy certain expectations' constraints. Such principle can be generalized for arbitrary decision problems where it corresponds to minimax approaches. This paper establishes a framework for supervised classification based on the generalized maximum entropy principle that leads to minimax risk classifiers (MRCs). We develop learning techniques that determine MRCs for general entropy functions and provide performance guarantees by means of convex optimization. In addition, we describe the relationship of the presented techniques with existing classification methods, and quantify MRCs performance in comparison with the proposed bounds and conventional methods.
Characteristics of Monte Carlo Dropout in Wide Neural Networks
Sicking, Joachim, Akila, Maram, Wirtz, Tim, Houben, Sebastian, Fischer, Asja
Monte Carlo (MC) dropout is one of the state-of-the-art approaches for uncertainty estimation in neural networks (NNs). It has been interpreted as approximately performing Bayesian inference. Based on previous work on the approximation of Gaussian processes by wide and deep neural networks with random weights, we study the limiting distribution of wide untrained NNs under dropout more rigorously and prove that they as well converge to Gaussian processes for fixed sets of weights and biases. We sketch an argument that this property might also hold for infinitely wide feed-forward networks that are trained with (full-batch) gradient descent. The theory is contrasted by an empirical analysis in which we find correlations and non-Gaussian behaviour for the pre-activations of finite width NNs. We therefore investigate how (strongly) correlated pre-activations can induce non-Gaussian behavior in NNs with strongly correlated weights.
Variational Inference with Continuously-Indexed Normalizing Flows
Caterini, Anthony, Cornish, Rob, Sejdinovic, Dino, Doucet, Arnaud
Continuously-indexed flows (CIFs) have recently achieved improvements over baseline normalizing flows in a variety of density estimation tasks. In this paper, we adapt CIFs to the task of variational inference (VI) through the framework of auxiliary VI, and demonstrate that the advantages of CIFs over baseline flows can also translate to the VI setting for both sampling from posteriors with complicated topology and performing maximum likelihood estimation in latent-variable models.
Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions
From February to April 2020, many countries introduced variations on social distancing measures to slow the ravages of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Publicly available data show that Germany has been particularly successful in minimizing death rates. Dehning et al. quantified three governmental interventions introduced to control the outbreak. The authors predicted that the third governmental intervention—a strict contact ban since 22 March—switched incidence from growth to decay. They emphasize that relaxation of controls must be done carefully, not only because there is a 2-week lag between a measure being enacted and the effect on case reports but also because the three measures used in Germany only just kept virus spread below the growth threshold. Science , this issue p. [eabb9789][1] ### INTRODUCTION When faced with the outbreak of a novel epidemic such as coronavirus disease 2019 (COVID-19), rapid response measures are required by individuals, as well as by society as a whole, to mitigate the spread of the virus. During this initial, time-critical period, neither the central epidemiological parameters nor the effectiveness of interventions such as cancellation of public events, school closings, or social distancing is known. ### RATIONALE As one of the key epidemiological parameters, we inferred the spreading rate λ from confirmed SARS-CoV-2 infections using the example of Germany. We apply Bayesian inference based on Markov chain Monte Carlo sampling to a class of compartmental models [susceptible-infected-recovered (SIR)]. Our analysis characterizes the temporal change of the spreading rate and allows us to identify potential change points. Furthermore, it enables short-term forecast scenarios that assume various degrees of social distancing. A detailed description is provided in the accompanying paper, and the models, inference, and forecasts are available on GitHub ([https://github.com/Priesemann-Group/covid19\_inference\_forecast][2]). Although we apply the model to Germany, our approach can be readily adapted to other countries or regions. ### RESULTS In Germany, interventions to contain the COVID-19 outbreak were implemented in three steps over 3 weeks: (i) Around 9 March 2020, large public events such as soccer matches were canceled; (ii) around 16 March 2020, schools, childcare facilities, and many stores were closed; and (iii) on 23 March 2020, a far-reaching contact ban ( Kontaktsperre ) was imposed by government authorities; this included the prohibition of even small public gatherings as well as the closing of restaurants and all nonessential stores. From the observed case numbers of COVID-19, we can quantify the impact of these measures on the disease spread using change point analysis. Essentially, we find that at each change point the spreading rate λ decreased by ~40%. At the first change point, assumed around 9 March 2020, λ decreased from 0.43 to 0.25, with 95% credible intervals (CIs) of [0.35, 0.51] and [0.20, 0.30], respectively. At the second change point, assumed around 16 March 2020, λ decreased to 0.15 (CI [0.12, 0.20]). Both changes in λ slowed the spread of the virus but still implied exponential growth (see red and orange traces in the figure). To contain the disease spread, i.e., to turn exponential growth into a decline of new cases, the spreading rate has to be smaller than the recovery rate μ = 0.13 (CI [0.09, 0.18]). This critical transition was reached with the third change point, which resulted in λ = 0.09 (CI [0.06, 0.13]; see blue trace in the figure), assumed around 23 March 2020. From the peak position of daily new cases, one could conclude that the transition from growth to decline was already reached at the end of March. However, the observed transient decline can be explained by a short-term effect that originates from a sudden change in the spreading rate (see Fig. 2C in the main text). As long as interventions and the concurrent individual behavior frequently change the spreading rate, reliable short- and long-term forecasts are very difficult. As the figure shows, the three example scenarios (representing the effects up to the first, second, and third change point) quickly diverge from each other and, consequently, span a considerable range of future case numbers. Inference and subsequent forecasts are further complicated by the delay of ~2 weeks between an intervention and the first useful estimates of the new λ (which are derived from the reported case numbers). Because of this delay, any uncertainty in the magnitude of social distancing in the previous 2 weeks can have a major impact on the case numbers in the subsequent 2 weeks. Beyond 2 weeks, the case numbers depend on our future behavior, for which we must make explicit assumptions. In sum, future interventions (such as lifting restrictions) should be implemented cautiously to respect the delayed visibility of their effects. ### CONCLUSION We developed a Bayesian framework for the spread of COVID-19 to infer central epidemiological parameters and the timing and magnitude of intervention effects. With such an approach, the effects of interventions can be assessed in a timely manner. Future interventions and lifting of restrictions can be modeled as additional change points, enabling short-term forecasts for case numbers. In general, our approach may help to infer the efficiency of measures taken in other countries and inform policy-makers about tightening, loosening, and selecting appropriate measures for containment of COVID-19. ![Figure][3] Bayesian inference of SIR model parameters from daily new cases of COVID-19 enables us to assess the impact of interventions. In Germany, three interventions (mild social distancing, strong social distancing, and contact ban) were enacted consecutively (circles). Colored lines depict the inferred models that include the impact of one, two, or three interventions (red, orange, or green, respectively, with individual data cutoff) or all available data until 21 April 2020 (blue). Forecasts (dashed lines) show how case numbers would have developed without the effects of the subsequent change points. Note the delay between intervention and first possible inference of parameters caused by the reporting delay and the necessary accumulation of evidence (gray arrows). Shaded areas indicate 50% and 95% Bayesian credible intervals. As coronavirus disease 2019 (COVID-19) is rapidly spreading across the globe, short-term modeling forecasts provide time-critical information for decisions on containment and mitigation strategies. A major challenge for short-term forecasts is the assessment of key epidemiological parameters and how they change when first interventions show an effect. By combining an established epidemiological model with Bayesian inference, we analyzed the time dependence of the effective growth rate of new infections. Focusing on COVID-19 spread in Germany, we detected change points in the effective growth rate that correlate well with the times of publicly announced interventions. Thereby, we could quantify the effect of interventions and incorporate the corresponding change points into forecasts of future scenarios and case numbers. Our code is freely available and can be readily adapted to any country or region. [1]: /lookup/doi/10.1126/science.abb9789 [2]: https://github.com/Priesemann-Group/covid19_inference_forecast [3]: pending:yes
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
Yu, Tong, Kveton, Branislav, Wen, Zheng, Zhang, Ruiyi, Mengshoel, Ole J.
We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines.
Training Restricted Boltzmann Machines with Binary Synapses using the Bayesian Learning Rule
Restricted Boltzmann machines (RBMs) with low-precision synapses are much appealing with high energy efficiency. However, training RBMs with binary synapses is challenging due to the discrete nature of synapses. Recently Huang proposed one efficient method to train RBMs with binary synapses by using a combination of gradient ascent and the message passing algorithm under the variational inference framework. However, additional heuristic clipping operation is needed. In this technical note, inspired from Huang's work , we propose one alternative optimization method using the Bayesian learning rule, which is one natural gradient variational inference method. As opposed to Huang's method, we update the natural parameters of the variational symmetric Bernoulli distribution rather than the expectation parameters. Since the natural parameters take values in the entire real domain, no additional clipping is needed. Interestingly, the algorithm in \cite{huang2019data} could be viewed as one first-order approximation of the proposed algorithm, which justifies its efficacy with heuristic clipping.
Non-parametric Models for Non-negative Functions
Marteau-Ferey, Ulysse, Bach, Francis, Rudi, Alessandro
Linear models have shown great effectiveness and flexibility in many fields such as machine learning, signal processing and statistics. They can represent rich spaces of functions while preserving the convexity of the optimization problems where they are used, and are simple to evaluate, differentiate and integrate. However, for modeling non-negative functions, which are crucial for unsupervised learning, density estimation, or non-parametric Bayesian methods, linear models are not applicable directly. Moreover, current state-of-the-art models like generalized linear models either lead to non-convex optimization problems, or cannot be easily integrated. In this paper we provide the first model for non-negative functions which benefits from the same good properties of linear models. In particular, we prove that it admits a representer theorem and provide an efficient dual formulation for convex problems. We study its representation power, showing that the resulting space of functions is strictly richer than that of generalized linear models. Finally we extend the model and the theoretical results to functions with outputs in convex cones. The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.