AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Non-Canonical Hamiltonian Monte Carlo

Brofos, James A., Lederman, Roy R.

arXiv.org Machine LearningAug-18-2020

Hamiltonian Monte Carlo is typically based on the assumption of an underlying canonical symplectic structure. Numerical integrators designed for the canonical structure are incompatible with motion generated by non-canonical dynamics. These non-canonical dynamics, motivated by examples in physics and symplectic geometry, correspond to techniques such as preconditioning which are routinely used to improve algorithmic performance. Indeed, recently, a special case of non-canonical structure, magnetic Hamiltonian Monte Carlo, was demonstrated to provide advantageous sampling properties. We present a framework for Hamiltonian Monte Carlo using non-canonical symplectic structures. Our experimental results demonstrate sampling advantages associated to Hamiltonian Monte Carlo with non-canonical structure. To summarize our contributions: (i) we develop non-canonical HMC from foundations in symplectic geomtry; (ii) we construct an HMC procedure using implicit integration that satisfies the detailed balance; (iii) we propose to accelerate the sampling using an {\em approximate} explicit methodology; (iv) we study two novel, randomly-generated non-canonical structures: magnetic momentum and the coupled magnet structure, with implicit and explicit integration.

artificial intelligence, integrator, machine learning, (16 more...)

arXiv.org Machine Learning

2008.08191

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
Europe > Spain > Canary Islands (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Selecting Data Adaptive Learner from Multiple Deep Learners using Bayesian Networks

Kobayashi, Shusuke, Shirayama, Susumu

arXiv.org Artificial IntelligenceAug-17-2020

A method to predict time-series using multiple deep learners and a Bayesian network is proposed. In this study, the input explanatory variables are Bayesian network nodes that are associated with learners. Training data are divided using K-means clustering, and multiple deep learners are trained depending on the cluster. A Bayesian network is used to determine which deep learner is in charge of predicting a time-series. We determine a threshold value and select learners with a posterior probability equal to or greater than the threshold value, which could facilitate more robust prediction. The proposed method is applied to financial time-series data, and the predicted results for the Nikkei 225 index are demonstrated.

artificial intelligence, bayesian network, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s00521-020-05234-6

2008.07709

Country:

North America > United States (0.46)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.49)

Industry:

Banking & Finance > Trading (1.00)
Energy (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Investigating maximum likelihood based training of infinite mixtures for uncertainty quantification

Däubener, Sina, Fischer, Asja

arXiv.org Artificial IntelligenceAug-17-2020

Uncertainty quantification in neural networks gained a lot of attention in the past years. The most popular approaches, Bayesian neural networks (BNNs), Monte Carlo dropout, and deep ensembles have one thing in common: they are all based on some kind of mixture model. While the BNNs build infinite mixture models and are derived via variational inference, the latter two build finite mixtures trained with the maximum likelihood method. In this work we investigate the effect of training an infinite mixture distribution with the maximum likelihood method instead of variational inference. We find that the proposed objective leads to stochastic networks with an increased predictive variance, which improves uncertainty based identification of miss-classification and robustness against adversarial attacks in comparison to a standard BNN with equivalent network structure. The new model also displays higher entropy on out-of-distribution data.

artificial intelligence, machine learning, variance, (19 more...)

arXiv.org Artificial Intelligence

2008.03209

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.91)

Add feedback

How to Do Things with Words: A Bayesian Approach

Gmytrasiewicz, Piotr (University of Illinois at Chicago)

Journal of Artificial Intelligence ResearchAug-17-2020

Communication changes the beliefs of the listener and of the speaker. The value of a communicative act stems from the valuable belief states which result from this act. To model this we build on the Interactive POMDP (IPOMDP) framework, which extends POMDPs to allow agents to model others in multi-agent settings, and we include communication that can take place between the agents to formulate Communicative IPOMDPs (CIPOMDPs). We treat communication as a type of action and therefore, decisions regarding communicative acts are based on decision-theoretic planning using the Bellman optimality principle and value iteration, just as they are for all other rational actions. As in any form of planning, the results of actions need to be precisely specified. We use the Bayes' theorem to derive how agents update their beliefs in CIPOMDPs; updates are due to agents' actions, observations, messages they send to other agents, and messages they receive from others. The Bayesian decision-theoretic approach frees us from the commonly made assumption of cooperative discourse - we consider agents which are free to be dishonest while communicating and are guided only by their selfish rationality. We use a simple Tiger game to illustrate the belief update, and to show that the ability to rationally communicate allows agents to improve efficiency of their interactions.

artificial intelligence, belief revision, machine learning, (18 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11951

AI Access Foundation

11951

Journal of Artificial Intelligence Research

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Kosovo > District of Gjilan > Kamenica (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Why Deep Learning Ensembles Outperform Bayesian Neural Networks

#artificialintelligenceAug-16-2020, 13:31:45 GMT

Recently I came across an interesting Paper named, "Deep Ensembles: A Loss Landscape Perspective" by a Laxshminarayan et al.In this article, I will break down the paper, summarise it's findings and delve into some of the techniques and strategies they used that will be useful for delving into understanding models and their learning process. It will also go over some possible extensions to the paper. You can also find my annotations on the paper down below. The authors conjectured (correctly) that Deep Ensembles (an ensemble of Deep learning models) outperform Bayesian Neural Networks because "popular scalable variational Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space." In simple words, when running a Bayesian Network at a single initialization it will reach one of the peaks and stop.

artificial intelligence, machine learning, trajectory, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.58)

Add feedback

Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems

Chen, Zhe, Wang, Yuyan, Lin, Dong, Cheng, Derek Zhiyuan, Hong, Lichan, Chi, Ed H., Cui, Claire

arXiv.org Machine LearningAug-16-2020

Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the same data can produce very different prediction results. Ensemble method is one state-of-the-art benchmark for prediction uncertainty estimation. However, ensembles are expensive to train and serve for web-scale traffic. In this paper, we seek to advance the understanding of prediction variation estimated by the ensemble method. Through empirical experiments on two widely used benchmark datasets MovieLens and Criteo in recommender systems, we observe that prediction variations come from various randomness sources, including training data shuffling, and parameter random initialization. By introducing more randomness into model training, we notice that ensemble's mean predictions tend to be more accurate while the prediction variations tend to be higher. Moreover, we propose to infer prediction variation from neuron activation strength and demonstrate the strong prediction power from activation strength features. Our experiment results show that the average R squared on MovieLens is as high as 0.56 and on Criteo is 0.81. Our method performs especially well when detecting the lowest and highest variation buckets, with 0.92 AUC and 0.89 AUC respectively. Our approach provides a simple way for prediction variation estimation, which opens up new opportunities for future work in many interesting areas (e.g.,model-based reinforcement learning) without relying on serving expensive ensemble models.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

2008.07032

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Bayesian Quantile Matching Estimation

Nirwan, Rajbir-Singh, Bertschinger, Nils

arXiv.org Machine LearningAug-14-2020

Due to data protection laws sensitive personal data cannot be released or shared among businesses as well as scientific institutions. While anonymization techniques are becoming increasingly popular, they often raise security concerns and have been re-identified in some cases Narayanan and Shmatikov (2010). To be on the safe side, big data collecting organisation such as Eurostat (statistical office of the European Union) or the World Bank only release aggregated summaries of their data. E.g.: Instead of individual salary data only selected quantiles of the population distribution are available. Thus, for exploratory analysis as well as statistical modeling, the need for methods which work on aggregated data is there.

gaussian noise model, order statistics, quantile, (9 more...)

arXiv.org Machine Learning

2008.06423

Country: Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > Europe Government (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Data-Informed Decomposition for Localized Uncertainty Quantification of Dynamical Systems

Subber, Waad, Ghosh, Sayan, Pandita, Piyush, Zhang, Yiming, Wang, Liping

arXiv.org Machine LearningAug-14-2020

Industrial dynamical systems often exhibit multi-scale response due to material heterogeneities, operation conditions and complex environmental loadings. In such problems, it is the case that the smallest length-scale of the systems dynamics controls the numerical resolution required to effectively resolve the embedded physics. In practice however, high numerical resolutions is only required in a confined region of the system where fast dynamics or localized material variability are exhibited, whereas a coarser discretization can be sufficient in the rest majority of the system. To this end, a unified computational scheme with uniform spatio-temporal resolutions for uncertainty quantification can be very computationally demanding. Partitioning the complex dynamical system into smaller easier-to-solve problems based of the localized dynamics and material variability can reduce the overall computational cost. However, identifying the region of interest for high-resolution and intensive uncertainty quantification can be a problem dependent. The region of interest can be specified based on the localization features of the solution, user interest, and correlation length of the random material properties. For problems where a region of interest is not evident, Bayesian inference can provide a feasible solution. In this work, we employ a Bayesian framework to update our prior knowledge on the localized region of interest using measurements and system response. To address the computational cost of the Bayesian inference, we construct a Gaussian process surrogate for the forward model. Once, the localized region of interest is identified, we use polynomial chaos expansion to propagate the localization uncertainty. We demonstrate our framework through numerical experiments on a three-dimensional elastodynamic problem.

artificial intelligence, latexit sha1, machine learning, (18 more...)

arXiv.org Machine Learning

2008.06556

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report (0.64)

Industry: Aerospace & Defense (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

VarFA: A Variational Factor Analysis Framework For Efficient Bayesian Learning Analytics

Wang, Zichao, Gu, Yi, Lan, Andrew, Baraniuk, Richard

arXiv.org Machine LearningAug-14-2020

We propose VarFA, a variational inference factor analysis framework that extends existing factor analysis models for educational data mining to efficiently output uncertainty estimation in the model's estimated factors. Such uncertainty information is useful, for example, for an adaptive testing scenario, where additional tests can be administered if the model is not quite certain about a students' skill level estimation. Traditional Bayesian inference methods that produce such uncertainty information are computationally expensive and do not scale to large data sets. VarFA utilizes variational inference which makes it possible to efficiently perform Bayesian inference even on very large data sets. We use the sparse factor analysis model as a case study and demonstrate the efficacy of VarFA on both synthetic and real data sets. VarFA is also very general and can be applied to a wide array of factor analysis models.

artificial intelligence, machine learning, student, (18 more...)

arXiv.org Machine Learning

2005.13107

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

A statistical theory of cold posteriors in deep neural networks

Aitchison, Laurence

arXiv.org Machine LearningAug-13-2020

To get Bayesian neural networks to perform comparably to standard neural networks it is usually necessary to artificially reduce uncertainty using a "tempered" or "cold" posterior. This is extremely concerning: if the prior is accurate, Bayes inference/decision theory is optimal, and any artificial changes to the posterior should harm performance. While this suggests that the prior may be at fault, here we argue that in fact, BNNs for image classification use the wrong likelihood. In particular, standard image benchmark datasets such as CIFAR-10 are carefully curated. We develop a generative model describing curation which gives a principled Bayesian account of cold posteriors, because the likelihood under this new generative model closely matches the tempered likelihoods used in past work.

cold posterior, neural network, posterior, (15 more...)

arXiv.org Machine Learning

2008.05912

Country: Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback