Uncertainty
Learning with Weak Supervision from Physics and Data-Driven Constraints
Ren, Hongyu (Peking University) | Stewart, Russell (Stanford University) | Song, Jiaming (Stanford University) | Kuleshov, Volodymyr (Stanford University) | Ermon, Stefano (Stanford University)
In many applications of machine learning, labeled data is scarce and obtaining additional labels is expensive. We introduce a new approach to supervising learning algorithms without labels by enforcing a small number of domain-specific constraints over the algorithmsโ outputs. The constraints can be provided explicitly based on prior knowledge โ e.g. we may require that objects detected in videos satisfy the laws of physics โ or implicitly extracted from data using a novel framework inspired by adversarial training. We demonstrate the effectiveness of constraint-based learning on a variety of tasks โ including tracking, object detection, and human pose estimation โ and we find that algorithms supervised with constraints achieve high accuracies with only a small amount of labels, or with no labels at all in some cases.
Rectified Gaussian Scale Mixtures and the Sparse Non-Negative Least Squares Problem
Nalci, Alican, Fedorov, Igor, Al-Shoukairi, Maher, Liu, Thomas T., Rao, Bhaskar D.
In this paper, we develop a Bayesian evidence maximization framework to solve the sparse non-negative least squares (S-NNLS) problem. We introduce a family of probability densities referred to as the Rectified Gaussian Scale Mixture (R- GSM) to model the sparsity enforcing prior distribution for the solution. The R-GSM prior encompasses a variety of heavy-tailed densities such as the rectified Laplacian and rectified Student- t distributions with a proper choice of the mixing density. We utilize the hierarchical representation induced by the R-GSM prior and develop an evidence maximization framework based on the Expectation-Maximization (EM) algorithm. Using the EM based method, we estimate the hyper-parameters and obtain a point estimate for the solution. We refer to the proposed method as rectified sparse Bayesian learning (R-SBL). We provide four R- SBL variants that offer a range of options for computational complexity and the quality of the E-step computation. These methods include the Markov chain Monte Carlo EM, linear minimum mean-square-error estimation, approximate message passing and a diagonal approximation. Using numerical experiments, we show that the proposed R-SBL method outperforms existing S-NNLS solvers in terms of both signal and support recovery performance, and is also very robust against the structure of the design matrix.
Kinetic Compressive Sensing
Scipioni, Michele, Santarelli, Maria F., Landini, Luigi, Catana, Ciprian, Greve, Douglas N., Price, Julie C., Pedemonte, Stefano
Parametric images provide insight into the spatial distribution of physiological parameters, but they are often extremely noisy, due to low SNR of tomographic data. Direct estimation from projections allows accurate noise modeling, improving the results of post-reconstruction fitting. We propose a method, which we name kinetic compressive sensing (KCS), based on a hierarchical Bayesian model and on a novel reconstruction algorithm, that encodes sparsity of kinetic parameters. Parametric maps are reconstructed by maximizing the joint probability, with an Iterated Conditional Modes (ICM) approach, alternating the optimization of activity time series (OS-MAP-OSL), and kinetic parameters (MAP-LM). We evaluated the proposed algorithm on a simulated dynamic phantom: a bias/variance study confirmed how direct estimates can improve the quality of parametric maps over a post-reconstruction fitting, and showed how the novel sparsity prior can further reduce their variance, without affecting bias. Real FDG PET human brain data (Siemens mMR, 40min) images were also processed. Results enforced how the proposed KCS-regularized direct method can produce spatially coherent images and parametric maps, with lower spatial noise and better tissue contrast. A GPU-based open source implementation of the algorithm is provided.
MLE-induced Likelihood for Markov Random Fields
Due to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable.
Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning
Zeno, Chen, Golan, Itay, Hoffer, Elad, Soudry, Daniel
We suggest a novel approach for the estimation of the posterior distribution of the weights of a neural network, using an online version of the variational Bayes method. Having a confidence measure of the weights allows to combat several shortcomings of neural networks, such as their parameter redundancy, and their notorious vulnerability to the change of input distribution ("catastrophic forgetting"). Specifically, We show that this approach helps alleviate the catastrophic forgetting phenomenon -- even without the knowledge of when the tasks are been switched. Furthermore, it improves the robustness of the network to weight pruning -- even without retraining.
A Common Framework for Natural Gradient and Taylor based Optimisation using Manifold Theory
This technical report constructs a theoretical framework to relate standard Taylor approximation based optimisation methods with Natural Gradient (NG), a method which is Fisher efficient with probabilistic models. Such a framework will be shown to also provide mathematical justification to combine higher order methods with the method of NG.
Augment and Reduce: Stochastic Inference for Large Categorical Distributions
Ruiz, Francisco J. R., Titsias, Michalis K., Dieng, Adji B., Blei, David M.
Categorical distributions are ubiquitous in machine learning, e.g., in classification, language models, and recommendation systems. They are also at the core of discrete choice models. However, when the number of possible outcomes is very large, using categorical distributions becomes computationally expensive, as the complexity scales linearly with the number of outcomes. To address this problem, we propose augment and reduce (A&R), a method to alleviate the computational complexity. A&R uses two ideas: latent variable augmentation and stochastic variational inference. It maximizes a lower bound on the marginal likelihood of the data. Unlike existing methods which are specific to softmax, A&R is more general and is amenable to other categorical models, such as multinomial probit. On several large-scale classification problems, we show that A&R provides a tighter bound on the marginal likelihood and has better predictive performance than existing approaches.
Tensorial Mixture Models
Sharir, Or, Tamari, Ronen, Cohen, Nadav, Shashua, Amnon
Casting neural networks in generative frameworks is a highly sought-after endeavor these days. Contemporary methods, such as Generative Adversarial Networks, capture some of the generative capabilities, but not all. In particular, they lack the ability of tractable marginalization, and thus are not suitable for many tasks. Other methods, based on arithmetic circuits and sum-product networks, do allow tractable marginalization, but their performance is challenged by the need to learn the structure of a circuit. Building on the tractability of arithmetic circuits, we leverage concepts from tensor analysis, and derive a family of generative models we call Tensorial Mixture Models (TMMs). TMMs assume a simple convolutional network structure, and in addition, lend themselves to theoretical analyses that allow comprehensive understanding of the relation between their structure and their expressive properties. We thus obtain a generative model that is tractable on one hand, and on the other hand, allows effective representation of rich distributions in an easily controlled manner. These two capabilities are brought together in the task of classification under missing data, where TMMs deliver state of the art accuracies with seamless implementation and design.
Variational Inference: A Review for Statisticians
Blei, David M., Kucukelbir, Alp, McAuliffe, Jon D.
One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.
thu-ml/zhusuan
ZhuSuan is a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan provides deep learning style primitives and algorithms for building probabilistic models and applying Bayesian inference. Variational inference with programmable variational posteriors, various objectives and advanced gradient estimators (SGVB, REINFORCE, VIMCO, etc.). ZhuSuan is still under development. Before the first stable release (1.0), please clone the repository and run This will install ZhuSuan and its dependencies automatically.