Goto

Collaborating Authors

 Bayesian Inference


Non-Canonical Hamiltonian Monte Carlo

arXiv.org Machine Learning

Hamiltonian Monte Carlo is typically based on the assumption of an underlying canonical symplectic structure. Numerical integrators designed for the canonical structure are incompatible with motion generated by non-canonical dynamics. These non-canonical dynamics, motivated by examples in physics and symplectic geometry, correspond to techniques such as preconditioning which are routinely used to improve algorithmic performance. Indeed, recently, a special case of non-canonical structure, magnetic Hamiltonian Monte Carlo, was demonstrated to provide advantageous sampling properties. We present a framework for Hamiltonian Monte Carlo using non-canonical symplectic structures. Our experimental results demonstrate sampling advantages associated to Hamiltonian Monte Carlo with non-canonical structure. To summarize our contributions: (i) we develop non-canonical HMC from foundations in symplectic geomtry; (ii) we construct an HMC procedure using implicit integration that satisfies the detailed balance; (iii) we propose to accelerate the sampling using an {\em approximate} explicit methodology; (iv) we study two novel, randomly-generated non-canonical structures: magnetic momentum and the coupled magnet structure, with implicit and explicit integration.


Selecting Data Adaptive Learner from Multiple Deep Learners using Bayesian Networks

arXiv.org Artificial Intelligence

A method to predict time-series using multiple deep learners and a Bayesian network is proposed. In this study, the input explanatory variables are Bayesian network nodes that are associated with learners. Training data are divided using K-means clustering, and multiple deep learners are trained depending on the cluster. A Bayesian network is used to determine which deep learner is in charge of predicting a time-series. We determine a threshold value and select learners with a posterior probability equal to or greater than the threshold value, which could facilitate more robust prediction. The proposed method is applied to financial time-series data, and the predicted results for the Nikkei 225 index are demonstrated.


Investigating maximum likelihood based training of infinite mixtures for uncertainty quantification

arXiv.org Artificial Intelligence

Uncertainty quantification in neural networks gained a lot of attention in the past years. The most popular approaches, Bayesian neural networks (BNNs), Monte Carlo dropout, and deep ensembles have one thing in common: they are all based on some kind of mixture model. While the BNNs build infinite mixture models and are derived via variational inference, the latter two build finite mixtures trained with the maximum likelihood method. In this work we investigate the effect of training an infinite mixture distribution with the maximum likelihood method instead of variational inference. We find that the proposed objective leads to stochastic networks with an increased predictive variance, which improves uncertainty based identification of miss-classification and robustness against adversarial attacks in comparison to a standard BNN with equivalent network structure. The new model also displays higher entropy on out-of-distribution data.


How to Do Things with Words: A Bayesian Approach

Journal of Artificial Intelligence Research

Communication changes the beliefs of the listener and of the speaker. The value of a communicative act stems from the valuable belief states which result from this act. To model this we build on the Interactive POMDP (IPOMDP) framework, which extends POMDPs to allow agents to model others in multi-agent settings, and we include communication that can take place between the agents to formulate Communicative IPOMDPs (CIPOMDPs). We treat communication as a type of action and therefore, decisions regarding communicative acts are based on decision-theoretic planning using the Bellman optimality principle and value iteration, just as they are for all other rational actions. As in any form of planning, the results of actions need to be precisely specified. We use the Bayes' theorem to derive how agents update their beliefs in CIPOMDPs; updates are due to agents' actions, observations, messages they send to other agents, and messages they receive from others. The Bayesian decision-theoretic approach frees us from the commonly made assumption of cooperative discourse - we consider agents which are free to be dishonest while communicating and are guided only by their selfish rationality. We use a simple Tiger game to illustrate the belief update, and to show that the ability to rationally communicate allows agents to improve efficiency of their interactions.


Why Deep Learning Ensembles Outperform Bayesian Neural Networks

#artificialintelligence

Recently I came across an interesting Paper named, "Deep Ensembles: A Loss Landscape Perspective" by a Laxshminarayan et al.In this article, I will break down the paper, summarise it's findings and delve into some of the techniques and strategies they used that will be useful for delving into understanding models and their learning process. It will also go over some possible extensions to the paper. You can also find my annotations on the paper down below. The authors conjectured (correctly) that Deep Ensembles (an ensemble of Deep learning models) outperform Bayesian Neural Networks because "popular scalable variational Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space." In simple words, when running a Bayesian Network at a single initialization it will reach one of the peaks and stop.


Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems

arXiv.org Machine Learning

Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the same data can produce very different prediction results. Ensemble method is one state-of-the-art benchmark for prediction uncertainty estimation. However, ensembles are expensive to train and serve for web-scale traffic. In this paper, we seek to advance the understanding of prediction variation estimated by the ensemble method. Through empirical experiments on two widely used benchmark datasets MovieLens and Criteo in recommender systems, we observe that prediction variations come from various randomness sources, including training data shuffling, and parameter random initialization. By introducing more randomness into model training, we notice that ensemble's mean predictions tend to be more accurate while the prediction variations tend to be higher. Moreover, we propose to infer prediction variation from neuron activation strength and demonstrate the strong prediction power from activation strength features. Our experiment results show that the average R squared on MovieLens is as high as 0.56 and on Criteo is 0.81. Our method performs especially well when detecting the lowest and highest variation buckets, with 0.92 AUC and 0.89 AUC respectively. Our approach provides a simple way for prediction variation estimation, which opens up new opportunities for future work in many interesting areas (e.g.,model-based reinforcement learning) without relying on serving expensive ensemble models.


Bayesian Quantile Matching Estimation

arXiv.org Machine Learning

Due to data protection laws sensitive personal data cannot be released or shared among businesses as well as scientific institutions. While anonymization techniques are becoming increasingly popular, they often raise security concerns and have been re-identified in some cases Narayanan and Shmatikov (2010). To be on the safe side, big data collecting organisation such as Eurostat (statistical office of the European Union) or the World Bank only release aggregated summaries of their data. E.g.: Instead of individual salary data only selected quantiles of the population distribution are available. Thus, for exploratory analysis as well as statistical modeling, the need for methods which work on aggregated data is there.


Data-Informed Decomposition for Localized Uncertainty Quantification of Dynamical Systems

arXiv.org Machine Learning

Industrial dynamical systems often exhibit multi-scale response due to material heterogeneities, operation conditions and complex environmental loadings. In such problems, it is the case that the smallest length-scale of the systems dynamics controls the numerical resolution required to effectively resolve the embedded physics. In practice however, high numerical resolutions is only required in a confined region of the system where fast dynamics or localized material variability are exhibited, whereas a coarser discretization can be sufficient in the rest majority of the system. To this end, a unified computational scheme with uniform spatio-temporal resolutions for uncertainty quantification can be very computationally demanding. Partitioning the complex dynamical system into smaller easier-to-solve problems based of the localized dynamics and material variability can reduce the overall computational cost. However, identifying the region of interest for high-resolution and intensive uncertainty quantification can be a problem dependent. The region of interest can be specified based on the localization features of the solution, user interest, and correlation length of the random material properties. For problems where a region of interest is not evident, Bayesian inference can provide a feasible solution. In this work, we employ a Bayesian framework to update our prior knowledge on the localized region of interest using measurements and system response. To address the computational cost of the Bayesian inference, we construct a Gaussian process surrogate for the forward model. Once, the localized region of interest is identified, we use polynomial chaos expansion to propagate the localization uncertainty. We demonstrate our framework through numerical experiments on a three-dimensional elastodynamic problem.


VarFA: A Variational Factor Analysis Framework For Efficient Bayesian Learning Analytics

arXiv.org Machine Learning

We propose VarFA, a variational inference factor analysis framework that extends existing factor analysis models for educational data mining to efficiently output uncertainty estimation in the model's estimated factors. Such uncertainty information is useful, for example, for an adaptive testing scenario, where additional tests can be administered if the model is not quite certain about a students' skill level estimation. Traditional Bayesian inference methods that produce such uncertainty information are computationally expensive and do not scale to large data sets. VarFA utilizes variational inference which makes it possible to efficiently perform Bayesian inference even on very large data sets. We use the sparse factor analysis model as a case study and demonstrate the efficacy of VarFA on both synthetic and real data sets. VarFA is also very general and can be applied to a wide array of factor analysis models.


A statistical theory of cold posteriors in deep neural networks

arXiv.org Machine Learning

To get Bayesian neural networks to perform comparably to standard neural networks it is usually necessary to artificially reduce uncertainty using a "tempered" or "cold" posterior. This is extremely concerning: if the prior is accurate, Bayes inference/decision theory is optimal, and any artificial changes to the posterior should harm performance. While this suggests that the prior may be at fault, here we argue that in fact, BNNs for image classification use the wrong likelihood. In particular, standard image benchmark datasets such as CIFAR-10 are carefully curated. We develop a generative model describing curation which gives a principled Bayesian account of cold posteriors, because the likelihood under this new generative model closely matches the tempered likelihoods used in past work.