Bayesian Inference
What is Bayes Theorem?
If you've been learning about data science or machine learning, there's a good chance you've heard the term "Bayes Theorem" before, or a "Bayes classifier". These concepts can be somewhat confusing, especially if you aren't used to thinking of probability from a traditional, frequentist statistics perspective. This article will attempt to explain the principles behind Bayes Theorem and how it's used in machine learning. Bayes Theorem is a method of calculating conditional probability. The traditional method of calculating conditional probability (the probability that one event occurs given the occurrence of a different event) is to use the conditional probability formula, calculating the joint probability of event one and event two occurring at the same time, and then dividing it by the probability of event two occurring.
Interval Neural Networks: Uncertainty Scores
Oala, Luis, Heiß, Cosmas, Macdonald, Jan, März, Maximilian, Samek, Wojciech, Kutyniok, Gitta
We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs) using a data-driven interval propagating network. This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic. The INN produces sensible lower and upper bounds encompassing the ground truth. We provide theoretical justification for the validity of these bounds. Furthermore, its asymmetric uncertainty scores offer additional, directional information beyond what Gaussian-based, symmetric variance estimation can provide. We find that noise in the data is adequately captured by the intervals produced with our method. In numerical experiments on an image reconstruction task, we demonstrate the practical utility of INNs as a proxy for the prediction error in comparison to two state-of-the-art uncertainty quantification methods. In summary, INNs produce fast, theoretically justified uncertainty scores for DNNs that are easy to interpret, come with added information and pose as improved error proxies - features that may prove useful in advancing the usability of DNNs especially in sensitive applications such as health care.
Bayesian Sparsification Methods for Deep Complex-valued Networks
Nazarov, Ivan, Burnaev, Evgeny
Deep neural networks are an integral part of machine learning and data science toolset for practical data-driven problem solving. With continual miniaturization ever more applications can be found in embedded systems. Common embedded applications include on-device image recognition and signal processing. Despite recent advances in generalization and optimization theory specific to deep networks, deploying in actual embedded hardware remains a challenge due to storage, real-time throughput, and arithmetic complexity restrictions [He et al., 2018]. Therefore, compression methods for achieving high model sparsity and numerical efficiency without losing much in performance are especially relevant.
VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning
Choi, Jongwon, Yi, Kwang Moo, Kim, Jihoon, Choo, Jincho, Kim, Byoungjip, Chang, Jin-Yeop, Gwon, Youngjune, Chang, Hyung Jin
Active Learning for discriminative models has largely been studied with the focus on individual samples, with less emphasis on how classes are distributed or which classes are hard to deal with. In this work, we show that this is harmful. We propose a method based on the Bayes' rule, that can naturally incorporate class imbalance into the Active Learning framework. We derive that three terms should be considered together when estimating the probability of a classifier making a mistake for a given sample; i) probability of mislabelling a class, ii) likelihood of the data given a predicted class, and iii) the prior probability on the abundance of a predicted class. Implementing these terms requires a generative model and an intractable likelihood estimation. Therefore, we train a Variational Auto Encoder (VAE) for this purpose. To further tie the VAE with the classifier and facilitate VAE training, we use the classifiers' deep feature representations as input to the VAE. By considering all three probabilities, among them especially the data imbalance, we can substantially improve the potential of existing methods under limited data budget. We show that our method can be applied to classification tasks on multiple different datasets -- including one that is a real-world dataset with heavy data imbalance -- significantly outperforming the state of the art.
Uncertainty Estimation in Cancer Survival Prediction
Loya, Hrushikesh, Poduval, Pranav, Anand, Deepak, Kumar, Neeraj, Sethi, Amit
Survival models are used in various fields, such as the development of cancer treatment protocols. Although many statistical and machine learning models have been proposed to achieve accurate survival predictions, little attention has been paid to obtain well-calibrated uncertainty estimates associated with each prediction. The currently popular models are opaque and untrustworthy in that they often express high confidence even on those test cases that are not similar to the training samples, and even when their predictions are wrong. We propose a Bayesian framework for survival models that not only gives more accurate survival predictions but also quantifies the survival uncertainty better. Our approach is a novel combination of variational inference for uncertainty estimation, neural multi-task logistic regression for estimating nonlinear and time-varying risk models, and an additional sparsity-inducing prior to work with high dimensional data.
Markovian Score Climbing: Variational Inference with KL(p||q)
Naesseth, Christian A., Lindsten, Fredrik, Blei, David
Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions $q$ and then finds the member of that family that is closest to the exact posterior $p$. Traditionally, VI algorithms minimize the "exclusive KL" KL$(q\|p)$, often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL$(p\|q)$, which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL. Consider a valid MCMC method, a Markov chain whose stationary distribution is $p$. The algorithm we develop iteratively samples the chain $z[k]$, and then uses those samples to follow the score function of the variational approximation, $\nabla \log q(z[k])$ with a Robbins-Monro step-size schedule. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final estimates. In a variant that ties the variational approximation directly to the Markov chain, MSC further provides a new algorithm that melds VI and MCMC. We illustrate convergence on a toy model and demonstrate the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.
Julia Language in Machine Learning: Algorithms, Applications, and Open Issues
Gao, Kaifeng, Tu, Jingzhi, Huo, Zenan, Mei, Gang, Piccialli, Francesco, Cuomo, Salvatore
Machine learning is driving development across many fields in science and engineering. A simple and efficient programming language could accelerate applications of machine learning in various fields. Currently, the programming languages most commonly used to develop machine learning algorithms include Python, MATLAB, and C/C ++. However, none of these languages well balance both efficiency and simplicity. The Julia language is a fast, easy-to-use, and open-source programming language that was originally designed for high-performance computing, which can well balance the efficiency and simplicity. This paper summarizes the related research work and developments in the application of the Julia language in machine learning. It first surveys the popular machine learning algorithms that are developed in the Julia language. Then, it investigates applications of the machine learning algorithms implemented with the Julia language. Finally, it discusses the open issues and the potential future directions that arise in the use of the Julia language in machine learning.
Anticipatory Psychological Models for Quickest Change Detection: Human Sensor Interaction
We consider anticipatory psychological models for human decision makers and their effect on sequential decision making. From a decision theoretic point of view, such models are time inconsistent meaning that Bellman's principle of optimality does not hold. The aim of this paper is to study how such an anxiety-based anticipatory utility can affect sequential decision making, such as quickest change detection, in multi-agent systems. We show that the interaction between anticipation-driven agents and sequential decision maker results in unusual (nonconvex) structure of the optimal decision policy. The methodology yields a useful mathematical framework for sensor interaction involving a human decision maker (with behavioral economics constraints) and a sensor equipped with automated sequential detector.
Improving Calibration in Mixup-trained Deep Neural Networks through Confidence-Based Loss Functions
Maroñas, Juan, Ramos, Daniel, Paredes, Roberto
Deep Neural Networks (DNN) represent the state of the art in many tasks. However, due to their overparameterization, their generalization capabilities are in doubt and are still under study. Consequently, DNN can overfit and assign overconfident predictions, as they tend to learn highly oscillating decision thresholds. This has been shown to affect the calibration of the confidences assigned to unseen data. Data Augmentation (DA) strategies have been proposed to overcome some of these limitations. One of the most popular is Mixup, which has shown a great ability to improve the accuracy of these models. Recent work has provided evidence that Mixup also improves the uncertainty quantification and calibration of DNN. In this work, we argue and provide empirical evidence that, due to its fundamentals, Mixup does not necessarily improve calibration. Based on our observations we propose a new loss function that improves the calibration, and also sometimes the accuracy. Our loss is inspired by Bayes decision theory and introduces a new training framework for designing losses for probabilistic modelling. We provide state-of-the-art accuracy with consistent improvements in calibration performance.
Unlocking the Power of Artificial Intelligence and Big Data in Medicine
Most of the daily news and recently published scientific papers on research, innovations, and applications in artificial intelligence (AI) refer to what is known as machine learning--algorithms using massive amounts of data and various methodologies to find patterns, support decisions, make predictions, or, for the deep learning part, self-identify important features in data. However, AI is a complex concept to grasp, and most people have little understanding of what it really is. AI was founded as an academic discipline in 1956 and, despite its youth, already has a rich history [1,2]. In more than 60 years of exploration and progress, AI has become a large field of research and development involving multidisciplinary approaches to address many challenges, from theoretical frameworks, methods, and tools to real implementations, risk analysis, and impact measures. The definition of AI is a moving target and changes over time with the evolution of the field. Since its early days, the field of AI has allowed the development of many techniques supporting decision support and prediction, as it is usually made by humans. As early as 1958, a perceptron was expected to be able "to walk, talk, see, write, reproduce itself and be conscious of its existence," which led a large scientific controversy between neural network and symbolic reasoning approaches [3].