Bayesian Inference
Estimating Model Uncertainty of Neural Networks in Sparse Information Form
Lee, Jongseok, Humt, Matthias, Feng, Jianxiang, Triebel, Rudolph
We present a sparse representation of model uncertainty for Deep Neural Networks (DNNs) where the parameter posterior is approximated with an inverse formulation of the Multivariate Normal Distribution (MND), also known as the information form. The key insight of our work is that the information matrix, i.e. the inverse of the covariance matrix tends to be sparse in its spectrum. Therefore, dimensionality reduction techniques such as low rank approximations (LRA) can be effectively exploited. To achieve this, we develop a novel sparsification algorithm and derive a cost-effective analytical sampler. As a result, we show that the information form can be scalably applied to represent model uncertainty in DNNs. Our exhaustive theoretical analysis and empirical evaluations on various benchmarks show the competitiveness of our approach over the current methods.
Learning Minimax Estimators via Online Learning
Gupta, Kartik, Suggala, Arun Sai, Prasad, Adarsh, Netrapalli, Praneeth, Ravikumar, Pradeep
We consider the problem of designing minimax estimators for estimating the parameters of a probability distribution. Unlike classical approaches such as the MLE and minimum distance estimators, we consider an algorithmic approach for constructing such estimators. We view the problem of designing minimax estimators as finding a mixed strategy Nash equilibrium of a zero-sum game. By leveraging recent results in online learning with non-convex losses, we provide a general algorithm for finding a mixed-strategy Nash equilibrium of general non-convex non-concave zero-sum games. Our algorithm requires access to two subroutines: (a) one which outputs a Bayes estimator corresponding to a given prior probability distribution, and (b) one which computes the worst-case risk of any given estimator. Given access to these two subroutines, we show that our algorithm outputs both a minimax estimator and a least favorable prior. To demonstrate the power of this approach, we use it to construct provably minimax estimators for classical problems such as estimation in the finite Gaussian sequence model, and linear regression.
A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components
Li, Yuantong, Ma, Qi, Ghosh, Sujit K.
Estimating parameters of mixture model has wide applications Determining the number of components in a finite mixture model ranging from classification problems to estimating of complex distributions. is crucial in many application areas such as financial data [16, 31, 35], Most of the current literature on estimating the parameters biomedical studies [17, 36] and low-frequency accident occurrence of the mixture densities are based on iterative Expectation Maximization prediction [27, 32]. Existing literature have witnessed numerous (EM) type algorithms which require the use of either computational methods, and in particular Markov Chain Monte taking expectations over the latent label variables or generating Carlo methods [7, 14, 33] and EM algorithms [20-22] have been samples from the conditional distribution of such latent labels using used with a lot of success. However, either these methods are computationally the Bayes rule. Moreover, when the number of components is demanding and/or these methods are developed under unknown, the problem becomes computationally more demanding the assumption of data being generated from mixtures of densities due to well-known label switching issues [28]. In this paper, we from the exponential family, in part because the family of exponential propose a robust and quick approach based on change-point methods distribution has a sufficient statistic of constant dimension (i.e., to determine the number of mixture components that works the dimension of the sufficient statistic remains fixed for any sample for almost any location-scale families even when the components size) and so the updates of the data augmentation type algorithm are heavy tailed (e.g., Cauchy). We present several numerical illustrations involve their smaller dimensional sufficient statistics [11, 12, 24].
Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization
Pleiss, Geoff, Jankowiak, Martin, Eriksson, David, Damle, Anil, Gardner, Jacob R.
Matrix square roots and their inverses arise frequently in machine learning, e.g., when sampling from high-dimensional Gaussians $\mathcal{N}(\mathbf 0, \mathbf K)$ or whitening a vector $\mathbf b$ against covariance matrix $\mathbf K$. While existing methods typically require $O(N^3)$ computation, we introduce a highly-efficient quadratic-time algorithm for computing $\mathbf K^{1/2} \mathbf b$, $\mathbf K^{-1/2} \mathbf b$, and their derivatives through matrix-vector multiplication (MVMs). Our method combines Krylov subspace methods with a rational approximation and typically achieves $4$ decimal places of accuracy with fewer than $100$ MVMs. Moreover, the backward pass requires little additional computation. We demonstrate our method's applicability on matrices as large as $50,\!000 \times 50,\!000$ - well beyond traditional methods - with little approximation error. Applying this increased scalability to variational Gaussian processes, Bayesian optimization, and Gibbs sampling results in more powerful models with higher accuracy.
Maximum likelihood estimation for Machine Learning - Nucleusbox
In the Logistic Regression for Machine Learning using Python blog, I have introduced the basic idea of the logistic function. We have discussed the cost function. And in the iterative method, we focus on the Gradient descent optimization method. Now so in this section, we are going to introduce the Maximum Likelihood cost function. And we would like to maximize this cost function.
Quantifying Assurance in Learning-enabled Systems
Asaadi, Erfan, Denney, Ewen, Pai, Ganesh
Dependability assurance of systems embedding machine learning(ML) components---so called learning-enabled systems (LESs)---is a key step for their use in safety-critical applications. In emerging standardization and guidance efforts, there is a growing consensus in the value of using assurance cases for that purpose. This paper develops a quantitative notion of assurance that an LES is dependable, as a core component of its assurance case, also extending our prior work that applied to ML components. Specifically, we characterize LES assurance in the form of assurance measures: a probabilistic quantification of confidence that an LES possesses system-level properties associated with functional capabilities and dependability attributes. We illustrate the utility of assurance measures by application to a real world autonomous aviation system, also describing their role both in i) guiding high-level, runtime risk mitigation decisions and ii) as a core component of the associated dynamic assurance case.
Probabilistic Safety for Bayesian Neural Networks
Wicker, Matthew, Laurenti, Luca, Patane, Andrea, Kwiatkowska, Marta
We study probabilistic safety for Bayesian Neural Networks (BNNs) under adversarial input perturbations. Given a compact set of input points, $T \subseteq \mathbb{R}^m$, we study the probability w.r.t. the BNN posterior that all the points in $T$ are mapped to the same region $S$ in the output space. In particular, this can be used to evaluate the probability that a network sampled from the BNN is vulnerable to adversarial attacks. We rely on relaxation techniques from non-convex optimization to develop a method for computing a lower bound on probabilistic safety for BNNs, deriving explicit procedures for the case of interval and linear function propagation techniques. We apply our methods to BNNs trained on a regression task, airborne collision avoidance, and MNIST, empirically showing that our approach allows one to certify probabilistic safety of BNNs with millions of parameters.
MARS: Masked Automatic Ranks Selection in Tensor Decompositions
Kodryan, Maxim, Kropotov, Dmitry, Vetrov, Dmitry
For instance, Tucker (Tucker, Tensor decomposition methods have recently 1966) and canonical polyadic (CP) (Caroll & Chang, 1970) proven to be efficient for compressing and accelerating tensor decompositions are widely known for compressing neural networks. However, the problem and accelerating convolutional networks (Lebedev of optimal decomposition structure determination et al., 2015; Kim et al., 2016; Kossaifi et al., 2019), and is still not well studied while being quite important. Tensor Train (TT) (Oseledets, 2011) decomposition has Specifically, decomposition ranks present been successfully applied for compressing fully-connected the crucial parameter controlling the compressionaccuracy (FC) (Novikov et al., 2015), convolutional (Garipov et al., tradeoff. In this paper, we introduce 2016), recurrent (Yang et al., 2017; Yu et al., 2017), embedding MARS -- a new efficient method for the automatic (Khrulkov et al., 2019) layers.
Likelihood-Free Inference with Deep Gaussian Processes
Aushev, Alexander, Pesonen, Henri, Heinonen, Markus, Corander, Jukka, Kaski, Samuel
In recent years, surrogate models have been successfully used in likelihood-free inference to decrease the number of simulator evaluations. The current state-of-the-art performance for this task has been achieved by Bayesian Optimization with Gaussian Processes (GPs). While this combination works well for unimodal target distributions, it is restricting the flexibility and applicability of Bayesian Optimization for accelerating likelihood-free inference more generally. We address this problem by proposing a Deep Gaussian Process (DGP) surrogate model that can handle more irregularly behaved target distributions. Our experiments show how DGPs can outperform GPs on objective functions with multimodal distributions and maintain a comparable performance in unimodal cases. This confirms that DGPs as surrogate models can extend the applicability of Bayesian Optimization for likelihood-free inference (BOLFI), while adding computational overhead that remains negligible for computationally intensive simulators.
GAT-GMM: Generative Adversarial Training for Gaussian Mixture Models
Farnia, Farzan, Wang, William, Das, Subhro, Jadbabaie, Ali
Learning the distribution of observed data is a basic task in unsupervised learning which has been studied for decades. The recently-introduced concept of Generative Adversarial Networks (GANs) [1] has demonstrated great success in various distribution learning tasks. Unlike the traditional maximum-likelihood-based approaches, GANs learn the distribution of observed data through a zero-sum game between two machine players, a generator G mimicking the true distribution of data and a discriminator D distinguishing the generator's produced samples from real data points. This zero-sum game is typically formulated through a minimax optimization problem where G and D optimize a minimax objective quantifying how dissimilar G's generated samples and real training samples are. In GAN minimax optimization problems, the generator and discriminator functions are commonly chosen as two deep neural networks (DNNs). Leveraging the expressive power of DNNs, GANs have achieved state-of-the-art performance in learning complex distributions of image data [2, 3, 4]. This success, however, is achieved at the cost of their notoriously difficult training procedure which has introduced several challenges to the machine learning community. Addressing these challenges requires a deeper theoretical understanding of GANs, including their approximation, generalization, and optimization properties. Specifically, GANs have been frequently observed to fail in learning multi-modal distributions [5].