AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Bayesian Analysis with Python – Second Edition

#artificialintelligenceOct-16-2019, 23:09:17 GMT

Bayesian Analysis with Python – Second Edition is a step-by-step guide to conduct Bayesian data analyses using PyMC3 and ArviZ. Description The second edition of Bayesian Analysis with Python is an introduction to the main concepts of applied Bayesian inference and its practical implementation in Python using PyMC3, a state-of-the-art probabilistic programming library, and ArviZ, a new library for exploratory analysis of Bayesian models.

artificial intelligence, bayesian analysis, bayesian inference, (3 more...)

#artificialintelligence

Genre: Instructional Material (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)

Add feedback

Practical Posterior Error Bounds from Variational Objectives

Huggins, Jonathan H., Kasprzak, Mikołaj, Campbell, Trevor, Broderick, Tamara

arXiv.org Machine LearningOct-16-2019

V ariational inference has become an increasingly attractive fast alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, a major obstacle to the widespread use of variational methods is the lack of post-hoc accuracy measures that are both theoretically justified and computationally efficient. In this paper, we provide rigorous bounds on the error of posterior mean and uncertainty estimates that arise from full-distribution approximations, as in variational inference. Our bounds are widely applicable as they require only that the approximating and exact posteriors have polynomial moments. Our bounds are computationally efficient for variational inference in that they require only standard values from varia-tional objectives, straightforward analytic calculations, and simple Monte Carlo estimates. We show that our analysis naturally leads to a new and improved workflow for variational inference. Finally, we demonstrate the utility of our proposed workflow and error bounds on a real-data example with a widely used multilevel hierarchical model.

approximation, practical posterior error bound, variational inference, (11 more...)

arXiv.org Machine Learning

1910.04102

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Workflow (0.87)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

Bayesian nightmare. Solved!

#artificialintelligenceOct-15-2019, 12:50:48 GMT

Who has not heard that Bayesian statistics are difficult, computationally slow, cannot scale-up to big data, the results are subjective; and we don't need it at all? Do we really need to learn a lot of math and a lot of classical statistics first before approaching Bayesian techniques. Why do the most popular books about Bayesian statistics have over 500 pages? Bayesian nightmare is real or myth? Someone once compared Bayesian approach to the kitchen of a Michelin star chef with high-quality chef knife, a stockpot and an expensive sautee pan; while Frequentism is like your ordinary kitchen, with banana slicers and pasta pots. People talk about Bayesianism and Frequentism as if they were two different religions. Does Bayes really put more burden on the data scientist to use her brain at the outset because Bayesianism is a religion for the brightest of the brightest?

posterior distribution, probability, signup rate, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

Buesing, Lars, Heess, Nicolas, Weber, Theophane

arXiv.org Artificial IntelligenceOct-15-2019

A plethora of problems in AI, engineering and the sciences are naturally formalized as inference in discrete probabilistic models. Exact inference is often prohibitively expensive, as it may require evaluating the (unnormalized) target density on its entire domain. Here we consider the setting where only a limited budget of calls to the unnormalized density oracle is available, raising the challenge of where in the domain to allocate these function calls in order to construct a good approximate solution. We formulate this problem as an instance of sequential decision-making under uncertainty and leverage methods from reinforcement learning for probabilistic inference with budget constraints. In particular, we propose the TreeSample algorithm, an adaptation of Monte Carlo Tree Search to approximate inference. This algorithm caches all previous queries to the density oracle in an explicit search tree, and dynamically allocates new queries based on a "best-first" heuristic for exploration, using existing upper confidence bound methods. Our non-parametric inference method can be effectively combined with neural networks that compile approximate conditionals of the target, which are then used to guide the inference search and enable generalization across multiple target distributions. We show empirically that TreeSample outperforms standard approximate inference methods on synthetic factor graphs.

artificial intelligence, bayesian inference, inference, (18 more...)

arXiv.org Artificial Intelligence

1910.06862

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Learning Sample-Specific Models with Low-Rank Personalized Regression

Lengerich, Benjamin, Aragam, Bryon, Xing, Eric P.

arXiv.org Machine LearningOct-15-2019

Modern applications of machine learning (ML) deal with increasingly heterogeneous datasets comprised of data collected from overlapping latent subpopulations. As a result, traditional models trained over large datasets may fail to recognize highly predictive localized effects in favour of weakly predictive global patterns. This is a problem because localized effects are critical to developing individualized policies and treatment plans in applications ranging from precision medicine to advertising. To address this challenge, we propose to estimate sample-specific models that tailor inference and prediction at the individual level. In contrast to classical ML models that estimate a single, complex model (or only a few complex models), our approach produces a model personalized to each sample. These sample-specific models can be studied to understand subgroup dynamics that go beyond coarse-grained class labels. Crucially, our approach does not assume that relationships between samples (e.g. a similarity network) are known a priori. Instead, we use unmodeled covariates to learn a latent distance metric over the samples. We apply this approach to financial, biomedical, and electoral data as well as simulated data and show that sample-specific models provide fine-grained interpretations of complicated phenomena without sacrificing predictive accuracy compared to state-of-the-art models such as deep neural networks.

dataset, prediction, regression, (16 more...)

arXiv.org Machine Learning

1910.06939

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Indiana (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Government > Voting & Elections (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Extracting robust and accurate features via a robust information bottleneck

Pensia, Ankit, Jog, Varun, Loh, Po-Ling

arXiv.org Machine LearningOct-15-2019

We propose a novel strategy for extracting features in supervised learning that can be used to construct a classifier which is more robust to small perturbations in the input space. Our method builds upon the idea of the information bottleneck by introducing an additional penalty term that encourages the Fisher information of the extracted features to be small, when parametrized by the inputs. By tuning the regularization parameter, we can explicitly trade off the opposing desiderata of robustness and accuracy when constructing a classifier. We derive the optimal solution to the robust information bottleneck when the inputs and outputs are jointly Gaussian, proving that the optimally robust features are also jointly Gaussian in that setting. Furthermore, we propose a method for optimizing a variational bound on the robust information bottleneck objective in general settings using stochastic gradient descent, which may be implemented efficiently in neural networks. Our experimental results for synthetic and real data sets show that the proposed feature extraction method indeed produces classifiers with increased robustness to perturbations.

inequality, information, robustness, (16 more...)

arXiv.org Machine Learning

1910.06893

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
(2 more...)

Add feedback

Counterfactual diagnosis

Richens, Jonathan G., Lee, Ciaran M., Johri, Saurabh

arXiv.org Artificial IntelligenceOct-15-2019

Causal knowledge is vital for effective reasoning in science and medicine. In medical diagnosis for example, a doctor aims to explain a patient's symptoms by determining the diseases causing them. However, all previous approaches to Machine-Learning assisted diagnosis, including Deep Learning and model-based Bayesian approaches, learn by association and do not distinguish correlation from causation. Here, we propose a new diagnostic algorithm based on counterfactual inference which captures the causal aspect of diagnosis overlooked by previous approaches. Using a statistical disease model, which describes the relations between hundreds of diseases, symptoms and risk factors, we compare our counterfactual algorithm to the standard Bayesian diagnostic algorithm, and test these against a cohort of 44 doctors. We use 1763 clinical vignettes created by a separate panel of doctors to benchmark performance. Each vignette provides a non-exhaustive list of symptoms and medical history simulating a single presentation of a disease. The algorithms and doctors are tasked with determining the underlying disease for each vignette from symptom and medical history information alone. While the Bayesian algorithm achieves the accuracy comparable to the average doctor, placing in the top 49\% of doctors in our cohort, our counterfactual algorithm places in the top 20\% of doctors, achieving expert clinical accuracy. Our results demonstrate the advantage of counterfactual over associative reasoning in a complex real-world task, and show that counterfactual reasoning is a vital missing ingredient for applying machine learning to medical diagnosis.

algorithm, diagnosis, symptom, (17 more...)

arXiv.org Artificial Intelligence

1910.06772

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
North America > Greenland (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Bayesian Temporal Factorization for Multidimensional Time Series Prediction

Sun, Lijun, Chen, Xinyu

arXiv.org Machine LearningOct-14-2019

Abstract--Large-scale and multidimensional spatiotemporal data sets are becoming ubiquitous in many real-world applications such as monitoring urban traffic and air quality . Making predictions on these time series has become a critical challenge due to not only the large-scale and high-dimensional nature but also the considerable amount of missing data. In this paper, we propose a Bayesian temporal factorization (BTF) framework for modeling multidimensional time series--in particular spatiotemporal data--in the presence of missing values. By integrating low-rank matrix/tensor factorization and vector autoregressive (VAR) process into a single probabilistic graphical model, this framework can characterize both global and local consistencies in large-scale time series data. The graphical model allows us to effectively perform probabilistic predictions and produce uncertainty estimates without imputing those missing values. We develop efficient Gibbs sampling algorithms for model inference and test the proposed BTF framework on several real-world spatiotemporal data sets for both missing data imputation and short-term/long-term rolling prediction tasks. The numerical experiments demonstrate the superiority of the proposed BTF approaches over many state-of-the-art techniques. With recent advances in sensing technologies, large-scale and multidimensional time series data--in particular spatiotemporal data--are collected on a continuous basis from various types of sensors and applications. Making predictions on these time series, such as forecasting urban traffic states and regional air quality, serves as a foundation to many real-world applications and benefits many scientific fields [1], [2]. For example, predicting the demand and states (e.g., speed, flow) of urban traffic is essential to a wide range of intelligent transportation systems (ITS) applications, such trip planning, travel time estimation, route planning, traffic signal control, to name but a few [3]. However, given the complex spatiotemporal dependencies in these data sets, making efficient and reliable predictions for real-time applications has been a longstanding and fundamental research challenge. Despite the vast body of literature on time series analysis from many scientific areas, three emerging issues in modern sensing technologies are constantly challenging the classical modeling frameworks. First, modern time series data are often large-scale, collected from a large number of subjects/locations/sensors simultaneously .

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1910.06366

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

MIM: Mutual Information Machine

Livne, Micha, Swersky, Kevin, Fleet, David J.

arXiv.org Machine LearningOct-14-2019

We introduce the Mutual Information Machine (MIM), an autoencoder model for learning joint distributions over observations and latent states. The model formulation reflects two key design principles: 1) symmetry, to encourage the encoder and decoder to learn consistent factorizations of the same underlying distribution; and 2) mutual information, to encourage the learning of useful representations for downstream tasks. The objective comprises the Jensen-Shannon divergence between the encoding and decoding joint distributions, plus a mutual information term. We show that this objective can be bounded by a tractable cross-entropy loss between the true model and a parameterized approximation, and relate this to maximum likelihood estimation and variational autoencoders. Experiments show that MIM is capable of learning a latent representation with high mutual information, and good unsupervised clustering, while providing data log likelihood comparable to VAE (with a sufficiently expressive architecture).

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1910.03175

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

PROFET: Construction and Inference of DBNs Based on Mathematical Models

Ajmal, Hamda, Madden, Michael, Enright, Catherine

arXiv.org Machine LearningOct-14-2019

PROFET: Construction and Inference of DBNs Based on Mathematical Models Hamda Ajmal, Michael Madden and Catherine Enright School of Computer Science, National University of Ireland Galway h.ajmal1@nuigalway.ie, Abstract This paper presents, evaluates, and discusses a new software tool to automatically build Dynamic Bayesian Networks (DBNs) from ordinary differential equations (ODEs) entered by the user. The DBNs generated from ODE models can handle both data uncertainty and model uncertainty in a principled manner. The application, named PROFET, can be used for temporal data mining with noisy or missing variables. It enables automatic re-estimation of model parameters using temporal evidence in the form of data streams. For temporal inference, PROFET includes both standard fixed time step particle filtering and its extension, adaptive-time particle filtering algorithms. Adaptive-time particle filtering enables the DBN to automatically adapt its time step length to match the dynamics of the model. We demonstrate PROFET's functionality by using it to infer the model variables by estimating the model parameters of four benchmark ODE systems. From the generation of the DBN model to temporal inference, the entire process is automated and is delivered as an open-source platform-independent software application with a comprehensive user interface. PROFET is released under the Apache License 2.0. Its source code, executable and documentation are available at http:://profet.

inference, model parameter, ode model, (16 more...)

arXiv.org Machine Learning

1910.04895

Country:

Europe > Ireland (0.24)
Europe > Czechia > Prague (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback