Boustati, Ayman, Akyildiz, Ömer Deniz, Damoulas, Theodoros, Johansen, Adam

We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of generalized Bayesian inference (GBI) to define generalized filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for robust inference against observation contamination through the $\beta$-divergence. Operationalizing the proposed framework is made possible via sequential Monte Carlo methods (SMC). The standard particle methods, and their associated convergence results, are readily generalized to the new setting. We demonstrate our approach to object tracking and Gaussian process regression problems, and observe improved performance over standard filtering algorithms.

Ji, Disi, Logan, Robert L. IV, Smyth, Padhraic, Steyvers, Mark

Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a crucial need to assess the performance of these pre-trained models, for instance to ensure sufficient predictive accuracy, or that class probabilities are well-calibrated. Furthermore, since labeled data may be scarce or costly to collect, it is desirable for such assessment be performed in an efficient manner. In this paper, we introduce a Bayesian approach for model assessment that satisfies these desiderata. We develop inference strategies to quantify uncertainty for common assessment metrics (accuracy, misclassification cost, expected calibration error), and propose a framework for active assessment using this uncertainty to guide efficient selection of instances for labeling. We illustrate the benefits of our approach in experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets.

This work develops rigorous theoretical basis for the fact that deep Bayesian neural network (BNN) is an effective tool for high-dimensional variable selection with rigorous uncertainty quantification. We develop new Bayesian non-parametric theorems to show that a properly configured deep BNN (1) learns the variable importance effectively in high dimensions, and its learning rate can sometimes "break" the curse of dimensionality. (2) BNN's uncertainty quantification for variable importance is rigorous, in the sense that its 95% credible intervals for variable importance indeed covers the truth 95% of the time (i.e., the Bernstein-von Mises (BvM) phenomenon). The theoretical results suggest a simple variable selection algorithm based on the BNN's credible intervals. Extensive simulation confirms the theoretical findings and shows that the proposed algorithm outperforms existing classic and neural-network-based variable selection methods, particularly in high dimensions.

In variational inference, the complicated target is approximated by a closest member relative to the Kullback-Leibler (KL) divergence in a pre-specified family of tractable densities. In many large-scale machine learning applications including clustering problems [11, 32], image classification [25, 27] and topic models [21, 7], variational inference can be orders of magnitude faster than the traditional sampling based approaches such as Markov Chain Monte Carlo (MCMC). In particular, by turning the integration, or sampling, problem into an optimization problem, variational inference can take advantage of modern optimization tools such as stochastic optimization techniques [20, 17] and distributed optimization architecture [1, 8] for further improving its efficiency. Among various approximating schemes, mean-field approximation is the most common type of variational inference that is conceptually simple, implementation-wise easy and particularly suitable for problems involving large numbers of latent variables. The word "mean-field" is originated from the mean-field theory in physics where despite complex interactions among many particles in a many (infinite) body system, all interactions to any one particle can be approximated by a single averaged effect from a "mean-field". In variational inference, by restricting the approximating family of the mean-field to be all density functions that are fully factorized over (blocks of) unknown variables, the associated optimization problem of finding a closest weih2@illinois.edu

Bayesian approaches have become increasingly popular in causal inference problems due to their conceptual simplicity, excellent performance and in-built uncertainty quantification ('posterior credible sets'). We investigate Bayesian inference for average treatment effects from observational data, which is a challenging problem due to the missing counterfactuals and selection bias. Working in the standard potential outcomes framework, we propose a data-driven modification to an arbitrary (nonparametric) prior based on the propensity score that corrects for the first-order posterior bias, thereby improving performance. We illustrate our method for Gaussian process (GP) priors using (semi-)synthetic data. Our experiments demonstrate significant improvement in both estimation accuracy and uncertainty quantification compared to the unmodified GP, rendering our approach highly competitive with the state-of-the-art.

Wu, Qingyang, Li, He, Su, Weijie, Li, Lexin, Yu, Zhou

With the widespread success of deep neural networks in science and technology, it is becoming increasingly important to quantify the uncertainty of the predictions produced by deep learning. In this paper, we introduce a new method that attaches an explicit uncertainty statement to the probabilities of classification using deep neural networks. Precisely, we view that the classification probabilities are sampled from an unknown distribution, and we propose to learn this distribution through the Dirichlet mixture that is flexible enough for approximating any continuous distribution on the simplex. We then construct credible intervals from the learned distribution to assess the uncertainty of the classification probabilities. Our approach is easy to implement, computationally efficient, and can be coupled with any deep neural network architecture. Our method leverages the crucial observation that, in many classification applications such as medical diagnosis, more than one class labels are available for each observational unit. We demonstrate the usefulness of our approach through simulations and a real data example.

Gadd, Charles W. L., Wade, Sara, Boukouvalas, Alexis

Mixtures of experts probabilistically divide the input space into regions, where the assumptions of each expert, or conditional model, need only hold locally. Combined with Gaussian process (GP) experts, this results in a powerful and highly flexible model. We focus on alternative mixtures of GP experts, which model the joint distribution of the inputs and targets explicitly. We highlight issues of this approach in multi-dimensional input spaces, namely, poor scalability and the need for an unnecessarily large number of experts, degrading the predictive performance and increasing uncertainty. We construct a novel model to address these issues through a nested partitioning scheme that automatically infers the number of components at both levels. Multiple response types are accommodated through a generalised GP framework, while multiple input types are included through a factorised exponential family structure. We show the effectiveness of our approach in estimating a parsimonious probabilistic description of both synthetic data of increasing dimension and an Alzheimer's challenge dataset.

Barbieri, Sebastiano, Perez-Concha, Oscar, Kotwal, Sradha, Gallagher, Martin, Ritchie, Angus, Jorm, Louisa

Objective: To evaluate the feasibility of using an attention-based neural network for predicting the risk of readmission within 30 days of discharge from the intensive care unit (ICU) based on longitudinal electronic medical record (EMR) data and to leverage the interpretability of the model to describe patients-at-risk. Methods: A "time-aware attention" model was trained using publicly available EMR data (MIMIC-III) associated with 45,298 ICU stays for 33,150 patients. The analysed EMR data included static (patient demographics) and timestamped variables (diagnoses, procedures, medications, and vital signs). Bayesian inference was used to compute the posterior distribution of network weights. The prediction accuracy of the proposed model was compared with several baseline models and evaluated based on average precision, AUROC, and F1-Score. Odds ratios (ORs) associated with an increased risk of readmission were computed for static variables. Diagnoses, procedures, and medications were ranked according to the associated risk of readmission. The model was also used to generate reports with predicted risk (and associated uncertainty) justified by specific diagnoses, procedures, medications, and vital signs. Results: A Bayesian ensemble of 10 time-aware attention models led to the highest predictive accuracy (average precision: 0.282, AUROC: 0.738, F1-Score: 0.353). Male gender, number of recent admissions, age, admission location, insurance type, and ethnicity were all associated with risk of readmission. A longer length of stay in the ICU was found to reduce the risk of readmission (OR: 0.909, 95% credible interval: 0.902, 0.916). Groups of patients at risk included those requiring cardiovascular or ventilatory support, those with poor nutritional state, and those for whom standard medical care was not suitable, e.g. due to contraindications to surgery or medications.

Posch, Konstantin, Steinbrener, Jan, Pilz, Jürgen

We present a novel approach for training deep neural networks in a Bayesian way. Classical, i.e. non-Bayesian, deep learning has two major drawbacks both originating from the fact that network parameters are considered to be deterministic. First, model uncertainty cannot be measured thus limiting the use of deep learning in many fields of application and second, training of deep neural networks is often hampered by overfitting. The proposed approach uses variational inference to approximate the intractable a posteriori distribution on basis of a normal prior. The variational density is designed in such a way that the a posteriori uncertainty of the network parameters is represented per network layer and depending on the estimated parameter expectation values. This way, only a few additional parameters need to be optimized compared to a non-Bayesian network. We apply this Bayesian approach to train and test the LeNet architecture on the MNIST dataset. Compared to classical deep learning, the test error is reduced by 15%. In addition, the trained model contains information about the parameter uncertainty in each layer. We show that this information can be used to calculate credible intervals for the prediction and to optimize the network architecture for a given training data set.

Pathak, Nilavra, Foulds, James, Roy, Nirmalya, Banerjee, Nilanjan, Robucci, Ryan

Modeling buildings' heat dynamics is a complex process which depends on various factors including weather, building thermal capacity, insulation preservation, and residents' behavior. Gray-box models offer a causal inference of those dynamics expressed in few parameters specific to built environments. These parameters can provide compelling insights into the characteristics of building artifacts and have various applications such as forecasting HVAC usage, indoor temperature control monitoring of built environments, etc. In this paper, we present a systematic study of modeling buildings' thermal characteristics and thus derive the parameters of built conditions with a Bayesian approach. We build a Bayesian state-space model that can adapt and incorporate buildings' thermal equations and propose a generalized solution that can easily adapt prior knowledge regarding the parameters. We show that a faster approximate approach using variational inference for parameter estimation can provide similar parameters as that of a more time-consuming Markov Chain Monte Carlo (MCMC) approach. We perform extensive evaluations on two datasets to understand the generative process and show that the Bayesian approach is more interpretable. We further study the effects of prior selection for the model parameters and transfer learning, where we learn parameters from one season and use them to fit the model in the other. We perform extensive evaluations on controlled and real data traces to enumerate buildings' parameter within a 95% credible interval.