Bayesian Inference: Overviews

A Framework for Testing Identifiability of Bayesian Models of Perception

Neural Information Processing Systems

Bayesian observer models are very effective in describing human performance in perceptual tasks, so much so that they are trusted to faithfully recover hidden mental representations of priors, likelihoods, or loss functions from the data. However, the intrinsic degeneracy of the Bayesian framework, as multiple combinations of elements can yield empirically indistinguishable results, prompts the question of model identifiability. We propose a novel framework for a systematic testing of the identifiability of a significant class of Bayesian observer models, with practical applications for improving experimental design. We examine the theoretical identifiability of the inferred internal representations in two case studies. First, we show which experimental designs work better to remove the underlying degeneracy in a time interval estimation task.

The Empirical Derivation of the Bayesian Formula Open Data Science Conference


Editor's note: James is a speaker for ODSC London this November! Be sure to check out his talk, "The How, Why, and When of Replacing Engineering Work with Compute Power" there. Deep learning has been made practical through modern computing power, but it is not the only technique benefiting from this large increase in power. Bayesian inference is up and coming technique whose recent progress is powered by the increase in computing power. We can explain the mathematical expression of Bayes formula using an example similar to the great Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference to a financial context and using mathematical concepts intuitively arise from code.

Distributed Bayesian Computation for Model Choice Machine Learning

We propose a general method for distributed Bayesian model choice, where each worker has access only to non-overlapping subsets of the data. Our approach approximates the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The model evidences per worker are then consistently combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. This divide-and-conquer approach allows Bayesian model choice in the large data setting, exploiting all available information but limiting communication between workers. Our work thereby complements the work on consensus Monte Carlo (Scott et al., 2016) by explicitly enabling model choice. In addition, we show how the suggested approach can be extended to model choice within a reversible jump setting that explores multiple models within one run.

Distilling importance sampling Machine Learning

The two main approaches to Bayesian inference are sampling and optimisation methods. However many complicated posteriors are difficult to approximate by either. Therefore we propose a novel approach combining features of both. We use a flexible parameterised family of densities, such as a normalising flow. Given a density from this family approximating the posterior we use importance sampling to produce a weighted sample from a more accurate posterior approximation. This sample is then used in optimisation to update the parameters of the approximate density, a process we refer to as "distilling" the importance sampling results. We illustrate our method in a queueing model example.

An Optimal Transport Formulation of the Ensemble Kalman Filter Machine Learning

Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the control law is not unique. This is the main problem addressed in this paper. To obtain a unique control law, the filtering problem is formulated here as an optimal transportation problem. An explicit formula for the (mean-field type) optimal control law is derived in the linear Gaussian setting. Comparisons are made with the control laws for different types of EnKF algorithms described in the literature. Via empirical approximation of the mean-field control law, a finite-$N$ controlled interacting particle algorithm is obtained. For this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that, under certain technical conditions, the mean squared error (m.s.e.) converges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite-$N$ algorithm. The analysis is used to prove weak convergence of the empirical distribution as $N\rightarrow\infty$. For a certain simplified filtering problem, analytical comparison of the m.s.e. with the importance sampling-based algorithms is described. The analysis helps explain the favorable scaling properties of the control-based algorithms reported in several numerical studies in recent literature.

bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond) Machine Learning

Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible model specifications are needed as big(ger) data become available, facilitated by more powerful computing infrastructure. (2) Full probabilistic modeling rather than predicting just means or expectations is crucial in many applications. (3) Interest in Bayesian inference has been increasing both as an appealing framework for regularizing or penalizing model estimation as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research in all three areas, also leading to associated software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available. For filling this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond). At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models (GAMs) or generalized additive models for location, scale, and shape (GAMLSS), also known as distributional regression. However, its building blocks are designed as "Lego bricks" encompassing various distributions (exponential family, Cox, joint models, ...), regression terms (linear, splines, random effects, tensor products, spatial fields, ...), and estimators (MCMC, backfitting, gradient boosting, lasso, ...). It is demonstrated how these can be easily recombined to make classical models more flexible or create new custom models for specific modeling challenges.

Machine Discovery of Partial Differential Equations from Spatiotemporal Data Machine Learning

The study presents a general framework for discovering underlying Partial Differential Equations (PDEs) using measured spatiotemporal data. The method, called Sparse Spatiotemporal System Discovery ($\text{S}^3\text{d}$), decides which physical terms are necessary and which can be removed (because they are physically negligible in the sense that they do not affect the dynamics too much) from a pool of candidate functions. The method is built on the recent development of Sparse Bayesian Learning; which enforces the sparsity in the to-be-identified PDEs, and therefore can balance the model complexity and fitting error with theoretical guarantees. Without leveraging prior knowledge or assumptions in the discovery process, we use an automated approach to discover ten types of PDEs, including the famous Navier-Stokes and sine-Gordon equations, from simulation data alone. Moreover, we demonstrate our data-driven discovery process with the Complex Ginzburg-Landau Equation (CGLE) using data measured from a traveling-wave convection experiment. Our machine discovery approach presents solutions that has the potential to inspire, support and assist physicists for the establishment of physical laws from measured spatiotemporal data, especially in notorious fields that are often too complex to allow a straightforward establishment of physical law, such as biophysics, fluid dynamics, neuroscience or nonlinear optics.

Bayesian Network Based Risk and Sensitivity Analysis for Production Process Stability Control Machine Learning

The biomanufacturing industry is growing rapidly and becoming one of the key drivers of personalized medicine and life science. However, biopharmaceutical production faces critical challenges, including complexity, high variability, long lead time and rapid changes in technologies, processes, and regulatory environment. Driven by these challenges, we explore the biotechnology domain knowledge and propose a rigorous risk and sensitivity analysis framework for biomanufacturing innovation. Built on the causal relationships of raw material quality attributes, production process, and bio-drug properties in safety and efficacy, we develop a Bayesian Network (BN) to model the complex probabilistic interdependence between process parameters and quality attributes of raw materials/in-process materials/drug substance. It integrates various sources of data and leads to an interpretable probabilistic knowledge graph of the end-to-end production process. Then, we introduce a systematic risk analysis to assess the criticality of process parameters and quality attributes. The complex production processes often involve many process parameters and quality attributes impacting on the product quality variability. However, the real-world (batch) data are often limited, especially for customized and personalized bio-drugs. We propose uncertainty quantification and sensitivity analysis to analyze the impact of model risk. Given very limited process data, the empirical results show that we can provide reliable and inter-Corresponding author Email addresses: Thus, the proposed framework can provide the science-and risk-based guidance on the process monitoring, data collection, and process parameters specifications to facilitate the production process learning and stability control. Keywords: Decision analysis, biomanufacturing, Bayesian network, production process risk analysis, sensitivity analysis 2017 MSC: 00-01, 99-00 1. Introduction In the past decades, pharmaceutical companies have invested billions of dollars in the research and development (R&D) of new biomedicines for the treatment of many severe illnesses, including cancer cells and adult blindness. More than 40 percent of the overall pharmaceutical industry R&D and products in the development pipeline are biopharmaceuticals and this percentage is expected to continuously increase. Compared to the classical pharmaceutical manufacturing, biopharmaceutical production faces several challenges, including complexity, high variability, long lead time and rapid changes in technologies, processes, and regulatory environment (Kaminsky & Wang, 2015). Biotechnology products are produced in living organisms, which induces a lot of uncertainty in the production process.

Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness Machine Learning

Ensemble learning is a methodology that integrates multiple DNN learners for improving prediction performance of individual learners. Diversity is greater when the errors of the ensemble prediction is more uniformly distributed. Greater diversity is highly correlated with the increase in ensemble accuracy. Another attractive property of diversity optimized ensemble learning is its robustness against deception: an adversarial perturbation attack can mislead one DNN model to misclassify but may not fool other ensemble DNN members consistently. In this paper we first give an overview of the concept of ensemble diversity and examine the three types of ensemble diversity in the context of DNN classifiers. We then describe a set of ensemble diversity measures, a suite of algorithms for creating diversity ensembles and for performing ensemble consensus (voted or learned) for generating high accuracy ensemble output by strategically combining outputs of individual members. This paper concludes with a discussion on a set of open issues in quantifying ensemble diversity for robust deep learning.

Marginally-calibrated deep distributional regression Machine Learning

Deep neural network (DNN) regression models are widely used in applications requiring state-of-the-art predictive accuracy. However, until recently there has been little work on accurate uncertainty quantification for predictions from such models. We add to this literature by outlining an approach to constructing predictive distributions that are `marginally calibrated'. This is where the long run average of the predictive distributions of the response variable matches the observed empirical margin. Our approach considers a DNN regression with a conditionally Gaussian prior for the final layer weights, from which an implicit copula process on the feature space is extracted. This copula process is combined with a non-parametrically estimated marginal distribution for the response. The end result is a scalable distributional DNN regression method with marginally calibrated predictions, and our work complements existing methods for probability calibration. The approach is first illustrated using two applications of dense layer feed-forward neural networks. However, our main motivating applications are in likelihood-free inference, where distributional deep regression is used to estimate marginal posterior distributions. In two complex ecological time series examples we employ the implicit copulas of convolutional networks, and show that marginal calibration results in improved uncertainty quantification. Our approach also avoids the need for manual specification of summary statistics, a requirement that is burdensome for users and typical of competing likelihood-free inference methods.