AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models

Zhang, Yichuan, Sutton, Charles

Neural Information Processing SystemsFeb-14-2020, 04:44:51 GMT

Sampling from hierarchical Bayesian models is often difficult for MCMC methods, because of the strong correlations between the model parameters and the hyperparameters. Recent Riemannian manifold Hamiltonian Monte Carlo (RMHMC) methods have significant potential advantages in this setting, but are computationally expensive. We introduce a new RMHMC method, which we call semi-separable Hamiltonian Monte Carlo, which uses a specially designed mass matrix that allows the joint Hamiltonian over model parameters and hyperparameters to decompose into two simpler Hamiltonians. This structure is exploited by a new integrator which we call the alternating blockwise leapfrog algorithm. The resulting method can mix faster than simpler Gibbs sampling while being simpler and more efficient than previous instances of RMHMC.

bayesian hierarchical model, hamiltonian monte carlo, semi-separable hamiltonian monte carlo, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

He, Bryan D., Sa, Christopher M. De, Mitliagkas, Ioannis, Ré, Christopher

Neural Information Processing SystemsFeb-14-2020, 04:44:50 GMT

Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.

gibbs sampling, matter and bound, systematic scan, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Bayesian Learning of Causal Relationships for System Reliability

Yu, Xuewen, Smith, Jim Q., Nichols, Linda

arXiv.org Artificial IntelligenceFeb-14-2020

Concurrently advances in causal modelling has led to a better understanding of how to predict complex systems when these are subjected to control. A major breakthrough then occurred about 20 years ago when it was then discovered that causal and graphical modelling could be combined. This paper reports on how the combination of these two technologies are being applied to give more insights concerning causal hypotheses that relate specially to reliability and system safety. Of course causal ideas have been embedded within reliability theory for a long time, both to explore the reasons behind a failure and to also estimate the efficacy of various interventions in the system that might ameliorate adverse outcomes. However, the graphical frameworks around which these ideas appear have been tree like - for example fault trees or event chains (Leverson 2011) - rather than the most common graphical framework for analyzing causation: the BN. Only recently have graphical causal methods emerged for methodologies and algorithms to exist for causal discovery and causal reasoning on such tree structures. The primary such graph is the chain event graph (CEG), see Thwaites, Smith and Riccomagno (2010), Barclay, Hutton and Smith (2013) and Collazo et.

artificial intelligence, bayesian inference, intervention, (18 more...)

arXiv.org Artificial Intelligence

2002.06084

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Domain Adaptation As a Problem of Inference on Graphical Models

Zhang, Kun, Gong, Mingming, Stojanov, Petar, Huang, Biwei, Glymour, Clark

arXiv.org Machine LearningFeb-14-2020

This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation.

artificial intelligence, conditional distribution, machine learning, (17 more...)

arXiv.org Machine Learning

2002.03278

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

Cutting out the Middle-Man: Training and Evaluating Energy-Based Models without Sampling

Grathwohl, Will, Wang, Kuan-Chieh, Jacobsen, Jorn-Henrik, Duvenaud, David, Zemel, Richard

arXiv.org Machine LearningFeb-14-2020

We present a new method for evaluating and training unnormalized density models. Our approach only requires access to the gradient of the unnormalized model's log-density. We estimate the Stein discrepancy between the data density p(x) and the model density q(x) defined by a vector function of the data. We parameterize this function with a neural network and fit its parameters to maximize the discrepancy. This yields a novel goodness-of-fit test which outperforms existing methods on high dimensional data. Furthermore, optimizing $q(x)$ to minimize this discrepancy produces a novel method for training unnormalized models which scales more gracefully than existing methods. The ability to both learn and compare models is a unique feature of the proposed method.

arxiv preprint arxiv, lsd, sampler, (13 more...)

arXiv.org Machine Learning

2002.05616

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Optimal estimation of high-dimensional Gaussian mixtures

Doss, Natalie, Wu, Yihong, Yang, Pengkun, Zhou, Harrison H.

arXiv.org Machine LearningFeb-13-2020

This paper studies the optimal rate of estimation in a finite Gaussian location mixture model in high dimensions without separation conditions. We assume that the number of components $k$ is bounded and that the centers lie in a ball of bounded radius, while allowing the dimension $d$ to be as large as the sample size $n$. Extending the one-dimensional result of Heinrich and Kahn \cite{HK2015}, we show that the minimax rate of estimating the mixing distribution in Wasserstein distance is $\Theta((d/n)^{1/4} + n^{-1/(4k-2)})$, achieved by an estimator computable in time $O(nd^2+n^{5/4})$. Furthermore, we show that the mixture density can be estimated at the optimal parametric rate $\Theta(\sqrt{d/n})$ in Hellinger distance; however, no computationally efficient algorithm is known to achieve the optimal rate. Both the theoretical and methodological development rely on a careful application of the method of moments. Central to our results is the observation that the information geometry of finite Gaussian mixtures is characterized by the moment tensors of the mixing distribution, whose low-rank structure can be exploited to obtain a sharp local entropy bound.

algorithm, dimension, estimation, (17 more...)

arXiv.org Machine Learning

2002.05818

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Fast Convergence for Langevin Diffusion with Matrix Manifold Structure

Moitra, Ankur, Risteski, Andrej

arXiv.org Machine LearningFeb-13-2020

In this paper, we study the problem of sampling from distributions of the form p(x) \propto e^{-\beta f(x)} for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex -- for which sampling from p may in general require an exponential number of queries. In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. We give a recipe for proving mixing time bounds of Langevin dynamics in order to sample from manifolds of local optima of the function f in settings where the distribution is well-concentrated around them. We specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X \in R^{d \times k} of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X \in R^{d \times k}, and \beta the inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.

inequality, manifold, matrix, (15 more...)

arXiv.org Machine Learning

2002.05576

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees

Rothfuss, Jonas, Fortuin, Vincent, Krause, Andreas

arXiv.org Machine LearningFeb-13-2020

Meta-learning can successfully acquire useful inductive biases from data, especially when a large number of meta-tasks are available. Yet, its generalization properties to unseen tasks are poorly understood. Particularly if the number of meta-tasks is small, this raises concerns for potential overfitting. We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning with unbounded loss functions and Bayesian base learners. Using these bounds, we develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-regularization. When instantiating our PAC-optimal hyper-posterior (PACOH) with Gaussian processes as base learners, the resulting approach consistently outperforms several popular meta-learning methods, both in terms of predictive accuracy and the quality of its uncertainty estimates.

bayes-optimal meta-learning, meta-learning, pacoh, (15 more...)

arXiv.org Machine Learning

2002.05551

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization

Akyildiz, Ömer Deniz, Sabanis, Sotirios

arXiv.org Machine LearningFeb-13-2020

This problem arises in many cases in machine learning, most notably in large-scale (mini-batch) Bayesian inference (Welling and Teh, 2011, Ahn et al., 2012) and nonconvex stochastic optimization (Raginsky et al., 2017). For the setting of Bayesian inference, one is interested in sampling from a posterior probability measure where U corresponds to the sum of the log-likelihood and the log-prior. For the nonconvex optimization, U(·) is the nonconvex cost function to be minimized. For large values ofβ, a sample from the target measure (1) is an approximate minimizer of the potential U (Raginsky et al., 2017). Consequently, nonasymptotic error bounds for the schemes, which are designed to sample from (1), can be used to obtain guarantees for Bayesian inference or nonconvex optimization. Sampling from a measure of the form (1) is also central in statistical physics (Binder et al., 1993), most notably in molecular dynamics Haile (1992).

assumption 2, chau and rasonyi, theorem 2, (8 more...)

arXiv.org Machine Learning

2002.05465

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.74)

Add feedback

The Big Three: A Methodology to Increase Data Science ROI by Answering the Questions Companies Care About

Griffin, Daniel K.

arXiv.org Machine LearningFeb-12-2020

Companies may be achieving only a third of the value they could be getting from data science in industry applications. In this paper, we propose a methodology for categorizing and answering 'The Big Three' questions (what is going on, what is causing it, and what actions can I take that will optimize what I care about) using data science. The applications of data science seem to be nearly endless in today's modern landscape, with each company jockeying for position in the new data and insights economy. Yet, data scientists seem to be solely focused on using classification, regression, and clustering methods to answer the question 'what is going on'. Answering questions about why things are happening or how to take optimal actions to improve metrics are relegated to niche fields of research and generally neglected in industry data science analysis. We survey technical methods to answer these other important questions, describe areas in which some of these methods are being applied, and provide a practical example of how to apply our methodology and selected methods to a real business use case.

application, information, reinforcement, (16 more...)

arXiv.org Machine Learning

2002.07069

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology > Services (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Add feedback