Bayesian Inference
Gaussian Process Conditional Density Estimation
Dutordoir, Vincent, Salimbeni, Hugh, Deisenroth, Marc, Hensman, James
Conditional Density Estimation (CDE) models deal with estimating conditional distributions. The conditions imposed on the distribution are the inputs of the model. CDE is a challenging task as there is a fundamental trade-off between model complexity, representational capacity and overfitting. In this work, we propose to extend the model's input with latent variables and use Gaussian processes (GP) to map this augmented input onto samples from the conditional distribution. Our Bayesian approach allows for the modeling of small datasets, but we also provide the machinery for it to be applied to big data using stochastic variational inference. Our approach can be used to model densities even in sparse data regions, and allows for sharing learned structure between conditions. We illustrate the effectiveness and wide-reaching applicability of our model on a variety of real-world problems, such as spatio-temporal density estimation of taxi drop-offs, non-Gaussian noise modeling, and few-shot learning on omniglot images.
Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior
Letarte, Gaรซl, Morvant, Emilie, Germain, Pascal
We revisit Rahimi and Recht (2007)'s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. While the primary goal of RFF is to approximate a kernel, we look at the Fourier transform as a prior distribution over trigonometric hypotheses. It naturally suggests learning a posterior on these hypotheses. We derive generalization bounds that are optimized by learning a pseudo-posterior obtained from a closed-form expression. Based on this study, we consider two learning strategies: The first one finds a compact landmarks-based representation of the data where each landmark is given by a distribution-tailored similarity measure, while the second one provides a PAC-Bayesian justification to the kernel alignment method of Sinha and Duchi (2016).
Computational Intelligence in Sports: A Systematic Literature Review
Bonidia, Robson P., Rodrigues, Luiz A. L., Avila-Santos, Anderson P., Sanches, Danilo S., Brancher, Jacques D.
Recently, data mining studies are being successfully conducted to estimate several parameters in a variety of domains. Data mining techniques have attracted the attention of the information industry and society as a whole, due to a large amount of data and the imminent need to turn it into useful knowledge. However, the effective use of data in some areas is still under development, as is the case in sports, which in recent years, has presented a slight growth; consequently, many sports organizations have begun to see that there is a wealth of unexplored knowledge in the data extracted by them. Therefore, this article presents a systematic review of sports data mining. Regarding years 2010 to 2018, 31 types of research were found in this topic. Based on these studies, we present the current panorama, themes, the database used, proposals, algorithms, and research opportunities. Our findings provide a better understanding of the sports data mining potentials, besides motivating the scientific community to explore this timely and interesting topic.
Principled Uncertainty Estimation for Deep Neural Networks
Harang, Richard, Rudd, Ethan M.
When the cost of misclassifying a sample is high, it is useful to have an accurate estimate of uncertainty in the prediction for that sample. There are also multiple types of uncertainty which are best estimated in different ways, for example, uncertainty that is intrinsic to the training set may be well-handled by a Bayesian approach, while uncertainty introduced by shifts between training and query distributions may be better-addressed by density/support estimation. In this paper, we examine three types of uncertainty: model capacity uncertainty, intrinsic data uncertainty, and open set uncertainty, and review techniques that have been derived to address each one. We then introduce a unified hierarchical model, which combines methods from Bayesian inference, invertible latent density inference, and discriminative classification in a single end-to-end deep neural network topology to yield efficient per-sample uncertainty estimation.
Prior-preconditioned conjugate gradient for accelerated Gibbs sampling in "large n & large p" sparse Bayesian logistic regression models
Nishimura, Akihiko, Suchard, Marc A.
In a modern observational study based on healthcare databases, the number of observations typically ranges in the order of 10^5 ~ 10^6 and that of the predictors in the order of 10^4 ~ 10^5. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression provides a potential solution. There is a rich literature on desirable theoretical properties of Bayesian approaches based on shrinkage priors. On the other hand, the development of scalable methods for the required posterior computation has largely been limited to the p >> n case. Shrinkage priors make the posterior amenable to Gibbs sampling, but a major computational bottleneck arises from the need to sample from a high-dimensional Gaussian distribution at each iteration. Despite a closed-form expression for the precision matrix $\Phi$, computing and factorizing such a large matrix is computationally expensive nonetheless. In this article, we present a novel algorithm to speed up this bottleneck based on the following observation: we can cheaply generate a random vector $b$ such that the solution to the linear system $\Phi \beta = b$ has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through the matrix-vector multiplications by $\Phi$, without ever explicitly inverting $\Phi$. Practical performance of CG, however, depends critically on appropriate preconditioning of the linear system; we turn CG into an effective algorithm for sparse Bayesian regression by developing a theory of prior-preconditioning. We apply our algorithm to a large-scale observational study with n = 72,489 and p = 22,175, designed to assess the relative risk of intracranial hemorrhage from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in the posterior computation.
Learning and Inference in Hilbert Space with Quantum Graphical Models
Srinivasan, Siddarth, Downey, Carlton, Boots, Byron
Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics. Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. Hilbert space embeddings (HSEs) also generalize Bayesian inference in Hilbert spaces. We investigate the link between QGMs and HSEs and show that the sum rule and Bayes rule for QGMs are equivalent to the kernel sum rule in HSEs and a special case of Nadaraya-Watson kernel regression, respectively. We show that these operations can be kernelized, and use these insights to propose a Hilbert Space Embedding of Hidden Quantum Markov Models (HSE-HQMM) to model dynamics. We present experimental results showing that HSE-HQMMs are competitive with state-of-the-art models like LSTMs and PSRNNs on several datasets, while also providing a nonparametric method for maintaining a probability distribution over continuous-valued features.
Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds
Reeb, David, Doerr, Andreas, Gerwinn, Sebastian, Rakitsch, Barbara
Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.
Approximate Bayesian Computation via Population Monte Carlo and Classification
Rogers-Smith, Charlie, Pesonen, Henri, Kaski, Samuel
Approximate Bayesian computation (ABC) methods can be used to sample from posterior distributions when the likelihood function is unavailable or intractable, as is often the case in biological systems. Sequential Monte Carlo (SMC) methods have been combined with ABC to improve efficiency, however these approaches require many simulations from the likelihood. We propose a classification approach within a population Monte Carlo (PMC) framework, where model class probabilities are used to update the particle weights. Our proposed approach outperforms state-of-the-art ratio estimation methods while retaining the automatic selection of summary statistics, and performs competitively with SMC ABC.
Variational Calibration of Computer Models
Marmin, Sรฉbastien, Filippone, Maurizio
Bayesian calibration of black-box computer models offers an established framework to obtain a posterior distribution over model parameters. Traditional Bayesian calibration involves the emulation of the computer model and an additive model discrepancy term using Gaussian processes; inference is then carried out using MCMC. These choices pose computational and statistical challenges and limitations, which we overcome by proposing the use of approximate Deep Gaussian processes and variational inference techniques. The result is a practical and scalable framework for calibration, which obtains competitive performance compared to the state-of-the-art.
Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models
Chamroukhi, Faicel, Huynh, Bao-Tuyen
Mixture of Experts (MoE) are successful models for modeling heterogeneous data in many statistical learning problems including regression, clustering and classification. Generally fitted by maximum likelihood estimation via the well-known EM algorithm, their application to high-dimensional problems is still therefore challenging. We consider the problem of fitting and feature selection in MoE models, and propose a regularized maximum likelihood estimation approach that encourages sparse solutions for heterogeneous regression data models with potentially high-dimensional predictors. Unlike state-of-the art regularized MLE for MoE, the proposed modelings do not require an approximate of the penalty function. We develop two hybrid EM algorithms: an Expectation-Majorization-Maximization (EM/MM) algorithm, and an EM algorithm with coordinate ascent algorithm. The proposed algorithms allow to automatically obtaining sparse solutions without thresholding, and avoid matrix inversion by allowing univariate parameter updates. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data.