Bayesian Learning: Overviews

A Framework for Testing Identifiability of Bayesian Models of Perception

Neural Information Processing Systems

Bayesian observer models are very effective in describing human performance in perceptual tasks, so much so that they are trusted to faithfully recover hidden mental representations of priors, likelihoods, or loss functions from the data. However, the intrinsic degeneracy of the Bayesian framework, as multiple combinations of elements can yield empirically indistinguishable results, prompts the question of model identifiability. We propose a novel framework for a systematic testing of the identifiability of a significant class of Bayesian observer models, with practical applications for improving experimental design. We examine the theoretical identifiability of the inferred internal representations in two case studies. First, we show which experimental designs work better to remove the underlying degeneracy in a time interval estimation task.

Machine Learning for Recommender Systems - A Primer


The growth of ecommerce in the recent past can only be described as explosive and sweeping across the planet. According to a 2016 study, half of all dollars spent online in America belong to Amazon. And consider this, Recommendation Engines alone drive 35% of that revenue. But it is not ecommerce alone that's reaping the huge benefits that recommendation engines have to offer. Direct to device streaming services such as Netflix, Spotify among others, analyze user behavior almost to a micro moment level, then gather data surrounding similar users who are likely to buy the same items based on their browsing history, and provide that much needed nudge to move on to the next purchase on the platform.

Learning from both experts and data Machine Learning

In this work we study the problem of inferring a discrete probability distribution using both expert knowledge and empirical data. This is an important issue for many applications where the scarcity of data prevents a purely empirical approach. In this context, it is common to rely first on an initial domain knowledge a priori before proceeding to an online data acquisition. We are particularly interested in the intermediate regime where we do not have enough data to do without the initial expert a priori of the experts, but enough to correct it if necessary. We present here a novel way to tackle this issue with a method providing an objective way to choose the weight to be given to experts compared to data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant.

The Empirical Derivation of the Bayesian Formula Open Data Science Conference


Editor's note: James is a speaker for ODSC London this November! Be sure to check out his talk, "The How, Why, and When of Replacing Engineering Work with Compute Power" there. Deep learning has been made practical through modern computing power, but it is not the only technique benefiting from this large increase in power. Bayesian inference is up and coming technique whose recent progress is powered by the increase in computing power. We can explain the mathematical expression of Bayes formula using an example similar to the great Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference to a financial context and using mathematical concepts intuitively arise from code.

Distributed Bayesian Computation for Model Choice Machine Learning

We propose a general method for distributed Bayesian model choice, where each worker has access only to non-overlapping subsets of the data. Our approach approximates the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The model evidences per worker are then consistently combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. This divide-and-conquer approach allows Bayesian model choice in the large data setting, exploiting all available information but limiting communication between workers. Our work thereby complements the work on consensus Monte Carlo (Scott et al., 2016) by explicitly enabling model choice. In addition, we show how the suggested approach can be extended to model choice within a reversible jump setting that explores multiple models within one run.

An Optimal Transport Formulation of the Ensemble Kalman Filter Machine Learning

Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the control law is not unique. This is the main problem addressed in this paper. To obtain a unique control law, the filtering problem is formulated here as an optimal transportation problem. An explicit formula for the (mean-field type) optimal control law is derived in the linear Gaussian setting. Comparisons are made with the control laws for different types of EnKF algorithms described in the literature. Via empirical approximation of the mean-field control law, a finite-$N$ controlled interacting particle algorithm is obtained. For this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that, under certain technical conditions, the mean squared error (m.s.e.) converges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite-$N$ algorithm. The analysis is used to prove weak convergence of the empirical distribution as $N\rightarrow\infty$. For a certain simplified filtering problem, analytical comparison of the m.s.e. with the importance sampling-based algorithms is described. The analysis helps explain the favorable scaling properties of the control-based algorithms reported in several numerical studies in recent literature.

bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond) Machine Learning

Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible model specifications are needed as big(ger) data become available, facilitated by more powerful computing infrastructure. (2) Full probabilistic modeling rather than predicting just means or expectations is crucial in many applications. (3) Interest in Bayesian inference has been increasing both as an appealing framework for regularizing or penalizing model estimation as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research in all three areas, also leading to associated software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available. For filling this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond). At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models (GAMs) or generalized additive models for location, scale, and shape (GAMLSS), also known as distributional regression. However, its building blocks are designed as "Lego bricks" encompassing various distributions (exponential family, Cox, joint models, ...), regression terms (linear, splines, random effects, tensor products, spatial fields, ...), and estimators (MCMC, backfitting, gradient boosting, lasso, ...). It is demonstrated how these can be easily recombined to make classical models more flexible or create new custom models for specific modeling challenges.

Bayesian Network Based Risk and Sensitivity Analysis for Production Process Stability Control Machine Learning

The biomanufacturing industry is growing rapidly and becoming one of the key drivers of personalized medicine and life science. However, biopharmaceutical production faces critical challenges, including complexity, high variability, long lead time and rapid changes in technologies, processes, and regulatory environment. Driven by these challenges, we explore the biotechnology domain knowledge and propose a rigorous risk and sensitivity analysis framework for biomanufacturing innovation. Built on the causal relationships of raw material quality attributes, production process, and bio-drug properties in safety and efficacy, we develop a Bayesian Network (BN) to model the complex probabilistic interdependence between process parameters and quality attributes of raw materials/in-process materials/drug substance. It integrates various sources of data and leads to an interpretable probabilistic knowledge graph of the end-to-end production process. Then, we introduce a systematic risk analysis to assess the criticality of process parameters and quality attributes. The complex production processes often involve many process parameters and quality attributes impacting on the product quality variability. However, the real-world (batch) data are often limited, especially for customized and personalized bio-drugs. We propose uncertainty quantification and sensitivity analysis to analyze the impact of model risk. Given very limited process data, the empirical results show that we can provide reliable and inter-Corresponding author Email addresses: Thus, the proposed framework can provide the science-and risk-based guidance on the process monitoring, data collection, and process parameters specifications to facilitate the production process learning and stability control. Keywords: Decision analysis, biomanufacturing, Bayesian network, production process risk analysis, sensitivity analysis 2017 MSC: 00-01, 99-00 1. Introduction In the past decades, pharmaceutical companies have invested billions of dollars in the research and development (R&D) of new biomedicines for the treatment of many severe illnesses, including cancer cells and adult blindness. More than 40 percent of the overall pharmaceutical industry R&D and products in the development pipeline are biopharmaceuticals and this percentage is expected to continuously increase. Compared to the classical pharmaceutical manufacturing, biopharmaceutical production faces several challenges, including complexity, high variability, long lead time and rapid changes in technologies, processes, and regulatory environment (Kaminsky & Wang, 2015). Biotechnology products are produced in living organisms, which induces a lot of uncertainty in the production process.

Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness Machine Learning

Ensemble learning is a methodology that integrates multiple DNN learners for improving prediction performance of individual learners. Diversity is greater when the errors of the ensemble prediction is more uniformly distributed. Greater diversity is highly correlated with the increase in ensemble accuracy. Another attractive property of diversity optimized ensemble learning is its robustness against deception: an adversarial perturbation attack can mislead one DNN model to misclassify but may not fool other ensemble DNN members consistently. In this paper we first give an overview of the concept of ensemble diversity and examine the three types of ensemble diversity in the context of DNN classifiers. We then describe a set of ensemble diversity measures, a suite of algorithms for creating diversity ensembles and for performing ensemble consensus (voted or learned) for generating high accuracy ensemble output by strategically combining outputs of individual members. This paper concludes with a discussion on a set of open issues in quantifying ensemble diversity for robust deep learning.

Minimum Description Length Revisited Machine Learning

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective.