Goto

Collaborating Authors

 Bayesian Learning


Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors

arXiv.org Machine Learning

Obtaining reliable uncertainty estimates of neural network predictions is a long standing challenge. Bayesian neural networks have been proposed as a solution, but it remains open how to specify their prior. In particular, the common practice of a standard normal prior in weight space imposes only weak regularities, causing the function posterior to possibly generalize in unforeseen ways on inputs outside of the training distribution. We propose noise contrastive priors (NCPs) to obtain reliable uncertainty estimates. The key idea is to train the model to output high uncertainty for data points outside of the training distribution. NCPs do so using an input prior, which adds noise to the inputs of the current mini batch, and an output prior, which is a wide distribution given these inputs. NCPs are compatible with any model that can output uncertainty estimates, are easy to scale, and yield reliable uncertainty estimates throughout training. Empirically, we show that NCPs prevent overfitting outside of the training distribution and result in uncertainty estimates that are useful for active learning. We demonstrate the scalability of our method on the flight delays data set, where we significantly improve upon previously published results.


Gaussian Process Conditional Density Estimation

arXiv.org Machine Learning

Conditional Density Estimation (CDE) models deal with estimating conditional distributions. The conditions imposed on the distribution are the inputs of the model. CDE is a challenging task as there is a fundamental trade-off between model complexity, representational capacity and overfitting. In this work, we propose to extend the model's input with latent variables and use Gaussian processes (GP) to map this augmented input onto samples from the conditional distribution. Our Bayesian approach allows for the modeling of small datasets, but we also provide the machinery for it to be applied to big data using stochastic variational inference. Our approach can be used to model densities even in sparse data regions, and allows for sharing learned structure between conditions. We illustrate the effectiveness and wide-reaching applicability of our model on a variety of real-world problems, such as spatio-temporal density estimation of taxi drop-offs, non-Gaussian noise modeling, and few-shot learning on omniglot images.


Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior

arXiv.org Machine Learning

We revisit Rahimi and Recht (2007)'s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. While the primary goal of RFF is to approximate a kernel, we look at the Fourier transform as a prior distribution over trigonometric hypotheses. It naturally suggests learning a posterior on these hypotheses. We derive generalization bounds that are optimized by learning a pseudo-posterior obtained from a closed-form expression. Based on this study, we consider two learning strategies: The first one finds a compact landmarks-based representation of the data where each landmark is given by a distribution-tailored similarity measure, while the second one provides a PAC-Bayesian justification to the kernel alignment method of Sinha and Duchi (2016).


STFT spectral loss for training a neural speech waveform model

arXiv.org Machine Learning

This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the aim of training a high-performance neural speech waveform model that predicts raw continuous speech waveform samples directly. Not only amplitude spectra but also phase spectra obtained from generated speech waveforms are used to calculate the proposed loss. We also mathematically show that training of the waveform model on the basis of the proposed loss can be interpreted as maximum likelihood training that assumes the amplitude and phase spectra of generated speech waveforms following Gaussian and von Mises distributions, respectively. Furthermore, this paper presents a simple network architecture as the speech waveform model, which is composed of uni-directional long short-term memories (LSTMs) and an auto-regressive structure. Experimental results showed that the proposed neural model synthesized high-quality speech waveforms.


Computational Intelligence in Sports: A Systematic Literature Review

arXiv.org Artificial Intelligence

Recently, data mining studies are being successfully conducted to estimate several parameters in a variety of domains. Data mining techniques have attracted the attention of the information industry and society as a whole, due to a large amount of data and the imminent need to turn it into useful knowledge. However, the effective use of data in some areas is still under development, as is the case in sports, which in recent years, has presented a slight growth; consequently, many sports organizations have begun to see that there is a wealth of unexplored knowledge in the data extracted by them. Therefore, this article presents a systematic review of sports data mining. Regarding years 2010 to 2018, 31 types of research were found in this topic. Based on these studies, we present the current panorama, themes, the database used, proposals, algorithms, and research opportunities. Our findings provide a better understanding of the sports data mining potentials, besides motivating the scientific community to explore this timely and interesting topic.


Principled Uncertainty Estimation for Deep Neural Networks

arXiv.org Machine Learning

When the cost of misclassifying a sample is high, it is useful to have an accurate estimate of uncertainty in the prediction for that sample. There are also multiple types of uncertainty which are best estimated in different ways, for example, uncertainty that is intrinsic to the training set may be well-handled by a Bayesian approach, while uncertainty introduced by shifts between training and query distributions may be better-addressed by density/support estimation. In this paper, we examine three types of uncertainty: model capacity uncertainty, intrinsic data uncertainty, and open set uncertainty, and review techniques that have been derived to address each one. We then introduce a unified hierarchical model, which combines methods from Bayesian inference, invertible latent density inference, and discriminative classification in a single end-to-end deep neural network topology to yield efficient per-sample uncertainty estimation.


Prior-preconditioned conjugate gradient for accelerated Gibbs sampling in "large n & large p" sparse Bayesian logistic regression models

arXiv.org Machine Learning

In a modern observational study based on healthcare databases, the number of observations typically ranges in the order of 10^5 ~ 10^6 and that of the predictors in the order of 10^4 ~ 10^5. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression provides a potential solution. There is a rich literature on desirable theoretical properties of Bayesian approaches based on shrinkage priors. On the other hand, the development of scalable methods for the required posterior computation has largely been limited to the p >> n case. Shrinkage priors make the posterior amenable to Gibbs sampling, but a major computational bottleneck arises from the need to sample from a high-dimensional Gaussian distribution at each iteration. Despite a closed-form expression for the precision matrix $\Phi$, computing and factorizing such a large matrix is computationally expensive nonetheless. In this article, we present a novel algorithm to speed up this bottleneck based on the following observation: we can cheaply generate a random vector $b$ such that the solution to the linear system $\Phi \beta = b$ has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through the matrix-vector multiplications by $\Phi$, without ever explicitly inverting $\Phi$. Practical performance of CG, however, depends critically on appropriate preconditioning of the linear system; we turn CG into an effective algorithm for sparse Bayesian regression by developing a theory of prior-preconditioning. We apply our algorithm to a large-scale observational study with n = 72,489 and p = 22,175, designed to assess the relative risk of intracranial hemorrhage from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in the posterior computation.


Learning and Inference in Hilbert Space with Quantum Graphical Models

arXiv.org Machine Learning

Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics. Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. Hilbert space embeddings (HSEs) also generalize Bayesian inference in Hilbert spaces. We investigate the link between QGMs and HSEs and show that the sum rule and Bayes rule for QGMs are equivalent to the kernel sum rule in HSEs and a special case of Nadaraya-Watson kernel regression, respectively. We show that these operations can be kernelized, and use these insights to propose a Hilbert Space Embedding of Hidden Quantum Markov Models (HSE-HQMM) to model dynamics. We present experimental results showing that HSE-HQMMs are competitive with state-of-the-art models like LSTMs and PSRNNs on several datasets, while also providing a nonparametric method for maintaining a probability distribution over continuous-valued features.


Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

arXiv.org Machine Learning

Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.


Approximate Bayesian Computation via Population Monte Carlo and Classification

arXiv.org Machine Learning

Approximate Bayesian computation (ABC) methods can be used to sample from posterior distributions when the likelihood function is unavailable or intractable, as is often the case in biological systems. Sequential Monte Carlo (SMC) methods have been combined with ABC to improve efficiency, however these approaches require many simulations from the likelihood. We propose a classification approach within a population Monte Carlo (PMC) framework, where model class probabilities are used to update the particle weights. Our proposed approach outperforms state-of-the-art ratio estimation methods while retaining the automatic selection of summary statistics, and performs competitively with SMC ABC.