Bayesian Learning
16. Appendix: Mathematics for Deep Learning -- Dive into Deep Learning 0.7 documentation
One of the wonderful parts of modern deep learning is the fact that much of it can be understood and used without a full understanding of the mathematics below it. This is a sign of the fact that the field is becoming more mature. Most software developers no longer need to worry about the theory of computable functions, or if programming languages without a goto can emulate programming languages with a goto with at most constant overhead, and neither should the deep learning practitioner need to worry about the theoretical foundations maximum likelihood learning, if one can find an architecture to approximate a target function to an arbitrary degree of accuracy. That said, we are not quite there yet. Sometimes when building a model in practice you will need to understand how architectural choices influence gradient flow, or what assumptions you are making by training with a certain loss function.
Variational Bayesian inference of hidden stochastic processes with unknown parameters
Atitey, Komlan, Loskot, Pavel, Mihaylova, Lyudmila
Estimating hidden processes from non-linear noisy observations is particularly difficult when the parameters of these processes are not known. This paper adopts a machine learning approach to devise variational Bayesian inference for such scenarios. In particular, a random process generated by the autoregressive moving average (ARMA) linear model is inferred from non-linearity noise observations. The posterior distribution of hidden states are approximated by a set of weighted particles generated by the sequential Monte carlo (SMC) algorithm involving sampling with importance sampling resampling (SISR). Numerical efficiency and estimation accuracy of the proposed inference method are evaluated by computer simulations. Furthermore, the proposed inference method is demonstrated on a practical problem of estimating the missing values in the gene expression time series assuming vector autoregressive (VAR) data model.
Sparse inversion for derivative of log determinant
Zhu, Shengxin, Wathen, Andrew J
Algorithms for Gaussian process, marginal likelihood methods or restricted maximum likelihood methods often require derivatives of log determinant terms. These log determinants are usually parametric with variance parameters of the underlying statistical models. This paper demonstrates that, when the underlying matrix is sparse, how to take the advantage of sparse inversion---selected inversion which share the same sparsity as the original matrix---to accelerate evaluating the derivative of log determinant.
On Modelling Label Uncertainty in Deep Neural Networks: Automatic Estimation of Intra-observer Variability in 2D Echocardiography Quality Assessment
Liao, Zhibin, Girgis, Hany, Abdi, Amir, Vaseli, Hooman, Hetherington, Jorden, Rohling, Robert, Gin, Ken, Tsang, Teresa, Abolmaesumi, Purang
--Uncertainty of labels in clinical data resulting from intra-observer variability can have direct impact on the reliability of assessments made by deep neural networks. In this paper, we propose a method for modelling such uncertainty in the context of 2D echocardiography (echo), which is a routine procedure for detecting cardiovascular disease at point-of-care. Echo imaging quality and acquisition time is highly dependent on the operator's experience level. Recent developments have shown the possibility of automating echo image quality quantification by mapping an expert's assessment of quality to the echo image via deep learning techniques. Nevertheless, the observer variability in the expert's assessment can impact the quality quantification accuracy. Here, we aim to model the intra-observer variability in echo quality assessment as an aleatoric uncertainty modelling regression problem with the introduction of a novel method that handles the regression problem with categorical labels. A key feature of our design is that only a single forward pass is sufficient to estimate the level of uncertainty for the network output. Compared to the 0 .11 The simplicity of the proposed approach means that it could be generalized to other applications of deep learning in medical imaging, where there is often uncertainty in clinical labels. Z. Liao and H. Girgis have contributed equally to this work. Abolmaesumi have contributed equally to the manuscript (emails: t.tsang@ubc.ca, Z. Liao, A. Abdi, H. V aseli, and J. Hetherington are with the Department of Electrical and Computer Engineering, The University of British Columbia, V ancouver, BC V6T 1Z4, Canada. H. Girgis, T. Tsang, and K. Gin are with V ancouver General Hospital Echocardiography Laboratory, Division of Cardiology, Department of Medicine, The University of British Columbia, V ancouver, BC V5Z 1M9, Canada. R. Rohling is with the Department of Electrical and Computer Engineering and the Department of Mechanical Engineering, The University of British Columbia, V ancouver, BC V6T 1Z4, Canada T. Tsang is the Director of the V ancouver General Hospital and University of British Columbia Echocardiography Laboratories, and Principal Investigator of the CIHR-NSERC grant supporting this work. Abolmaesumi is Co-Principal Investigator for the grant supporting this work and is with the Department of Electrical and Computer Engineering, The University of British Columbia, V ancouver, BC V6T 1Z4, Canada.
How Bayes' Theorem is Applied in Machine Learning - KDnuggets
In the previous post we saw what Bayes' Theorem is, and went through an easy, intuitive example of how it works. You can find this post here. If you don't know what Bayes' Theorem is, and you have not had the pleasure to read it yet, I recommend you do, as it will make understanding this present article a lot easier. In this post, we will see the uses of this theorem in Machine Learning. As mentioned in the previous post, Bayes' theorem tells use how to gradually update our knowledge on something as we get more evidence or that about that something.
Understanding Causal Inference
This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available. As data science work is experimental and probabilistic in nature, data scientists are often faced with making inferences. This may require a shift in mindset, particularly if moving from "traditional statistical analysis to causal analysis of multivariate data". As Domino is committed to providing the platform and tools data scientists need to accelerate their work, we reached out to Addison-Wesley Professional (AWP) Pearson for permission to excerpt "Causal Inference" from the book, Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. We appreciate the permissions to provide the chapter excerpt below as well as place the code within a complementary Domino project. We've introduced [in the book] a couple of machine-learning algorithms and suggested that they can be used to produce clear, interpretable results. You've seen that logistic regression coefficients can be used to say how much more likely an outcome will occur in conjunction with a feature (for binary features) or how much more likely an outcome is to occur per unit increase in a variable (for real-valued features). We'd like to make stronger statements. We'd like to say "If you increase a variable by a unit, then it will have the effect of making an outcome more likely." These two interpretations of a regression coefficient are so similar on the surface that you may have to read them a few times to take away the meaning. The key is that in the first case, we're describing what usually happens in a system that we observe. In the second case, we're saying what will happen if we intervene in that system and disrupt it from its normal operation. After we go through an example, we'll build up the mathematical and conceptual machinery to describe interventions. We'll cover how to go from a Bayesian network describing observational data to one that describes the effects of an intervention. We'll go through some classic approaches to estimating the effects of interventions, and finally we'll explain how to use machine-learning estimators to estimate the effects of interventions.
Probabilistic Model Selection with AIC, BIC, and MDL
Model selection is the problem of choosing one from among a set of candidate models. It is common to choose a model that performs the best on a hold-out test dataset or to estimate model performance using a resampling technique, such as k-fold cross-validation. An alternative approach to model selection involves using probabilistic statistical measures that attempt to quantify both the model performance on the training dataset and the complexity of the model. Examples include the Akaike and Bayesian Information Criterion and the Minimum Description Length. The benefit of these information criterion statistics is that they do not require a hold-out test set, although a limitation is that they do not take the uncertainty of the models into account and may end-up selecting models that are too simple.
Probabilistic Formulation of the Take The Best Heuristic
Peltola, Tomi, Jokinen, Jussi, Kaski, Samuel
The framework of cognitively bounded rationality treats problem solving as fundamentally rational, but emphasises that it is constrained by cognitive architecture and the task environment. This paper investigates a simple decision making heuristic, Take The Best (TTB), within that framework. We formulate TTB as a likelihood-based probabilistic model, where the decision strategy arises by probabilistic inference based on the training data and the model constraints. The strengths of the probabilistic formulation, in addition to providing a bounded rational account of the learning of the heuristic, include natural extensibility with additional cognitively plausible constraints and prior information, and the possibility to embed the heuristic as a subpart of a larger probabilistic model. We extend the model to learn cue discrimination thresholds for continuous-valued cues and experiment with using the model to account for biased preference feedback from a boundedly rational agent in a simulated interactive machine learning task.
Aerodynamic Data Fusion Towards the Digital Twin Paradigm
Renganathan, S. Ashwin, Harada, Kohei, Mavris, Dimitri N.
We consider the fusion of two aerodynamic data sets originating from differing fidelity physical or computer experiments. We specifically address the fusion of: 1) noisy and in-complete fields from wind tunnel measurements and 2) deterministic but biased fields from numerical simulations. These two data sources are fused in order to estimate the \emph{true} field that best matches measured quantities that serves as the ground truth. For example, two sources of pressure fields about an aircraft are fused based on measured forces and moments from a wind-tunnel experiment. A fundamental challenge in this problem is that the true field is unknown and can not be estimated with 100\% certainty. We employ a Bayesian framework to infer the true fields conditioned on measured quantities of interest; essentially we perform a \emph{statistical correction} to the data. The fused data may then be used to construct more accurate surrogate models suitable for early stages of aerospace design. We also introduce an extension of the Proper Orthogonal Decomposition with constraints to solve the same problem. Both methods are demonstrated on fusing the pressure distributions for flow past the RAE2822 airfoil and the Common Research Model wing at transonic conditions. Comparison of both methods reveal that the Bayesian method is more robust when data is scarce while capable of also accounting for uncertainties in the data. Furthermore, given adequate data, the POD based and Bayesian approaches lead to \emph{similar} results.
Learning Deep Bayesian Latent Variable Regression Models that Generalize: When Non-identifiability is a Problem
Yacoby, Yaniv, Pan, Weiwei, Doshi-Velez, Finale
Bayesian Neural Networks with Latent Variables (BNN+LV's) provide uncertainties in prediction estimates by explicitly modeling model uncertainty (via priors on network weights) and environmental stochasticity (via a latent input noise variable). In this work, we first show that BNN+LV suffers from a serious form of non-identifiability: explanatory power can be transferred between model parameters and input noise while fitting the data equally well. We demonstrate that, as a result, traditional inference methods may yield parameters that reconstruct observed data well but generalize poorly. Next, we develop a novel inference procedure that explicitly mitigates the effects of likelihood non-identifiability during training and yields high quality predictions as well as uncertainty estimates. We demonstrate that our inference method improves upon benchmark methods across a range of synthetic and real datasets.